Open SuperTango opened 7 years ago
Hi @SuperTango, thanks for your interest in B2 CLI. You can already impact speed to some extent by using --threads N
parameter. If it is not sufficient for you, could you please describe your use case, so that we can better understand it?
Thanks @ppolewicz. My use case is pretty simple, I only want to use a percentage of the available bandwidth. For example, my outbound pipe from the datacenter where my Linux machine is has a max throughput of about 200kBps, however I for backups, I want to ensure we only use a max of 75kBps.
It is possible to implement such a limiter, but doing it well in our environment is not easy, as we support many threads. There is no good open-source implementation of a module which would do the heavy lifting, that I could find, and I have spent some time searching for it.
Have you tried using trickle?
This is also a feature I was looking for a while ago. I've tried trickle back then and wasn't able to limit the bandwidth. I don't know what the problem was, so maybe there is a workaround. I've just moved the backup job to the middle of the night, so it's no priority for me.
I think you can also use iptables to limit bandwidth per destination. This will not allow you to set different limits if you run two sync processes concurrently.
I have researched this further and I got interested in writing something like it, just because I found lots of questions about this and no answers other than "use urlgrabber" (which is a libcurl wrapper). But first I need to deal with another challenge in b2 cli, so I'll leave it unassigned.
I don't think it is worth to implement this just for b2 CLI, but it can be made abstract enough to become useful.
If someone is going to work on this, please post here so that we can coordinate.
I think this is a pretty core feature for any backup (especially a sync) solution. Not flooding the network when performing a backup of potentially Terabytes of data is a requirement for me, not a "nice to have".
I haven't looked at the B2 command line tool codebase, but I've implemented a simple, yet effective throttling solution for another product I worked on a long time ago. It wasn't particularly difficult, but we were writing to sockets directly (not using a 3rd party lib). With many threads having each thread use 1/N (N = number of threads) amount of the bandwidth is good enough for this use case.
Sync should be smart - if there is an upload limit and a download limit, it should maximize the usage of both resources to minimize the session time, right? If only the limits are added, then likely first the bottleneck will be on uploading and then the bottleneck will be on downloading.
Another issue is that the number of parallel uploads/downloads will change over time as new tasks are scheduled and executed. A simple 1/N would be quite inefficient when compared to a smart one.
If you would be willing to contribute some code to b2 CLI, it would be very welcome! We encourage outside contributors to perform changes on our codebase. Many such changes have been merged already. In order to make it easier to contribute, core developers of this project:
Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst
Does b2 download_file_by_id
use threads as well? I'm using it to get specific versions of files and it saturates my bandwidth and sometimes causes issues. I will try @rwky's trickle
example. Are there any plans to implement --threads N
on b2 download_file_by_id
? Thanks!
in the current version it uses threads to parallelize downloads (it's required by b2 integration checklist), however the number of threads is not changeable from the CLI yet.
The uploading/downloading machinery in b2sdk is being reworked as we speak and one of the many improvements will be the ability to change the number of upload and download threads, or maybe even provide native bandwidth limiters, as a bit more global settings, so that you can tweak it for download, upload, sync, copy and metadata operations (sync
internally listing the contents of the bucket also consumes bandwidth). Bandwidth limiting is not planned in the initial scope of the rework, but the new structure of the code goes a long way towards enabling it.
We really need to have this in b2 CLI directly.
I see some people are suggesting trickle here, however, note that trickle does NOT work with Python 3.x, only Python 2.x. You can not use trickle to limit bandwidth utilization of python3 scripts, it will transparently fail.
Edit: for those looking for some kind of solution, if you can throttle NIC of the host doing uploads, you can do so for the duration of the upload, however, this is only valid when you have nothing else running on the host. And, this is not really a solution to this issue per-se.
Trickle works for me, an example is
trickle -s -u 200 b2 sync --threads 1 /src b2://dst
That doesn't appear to have any impact for me. Is it possibly related to trickle has no effect on Python 3 scripts? I am using Ubuntu 20.04 with B2 version 3.2.1 and trickle version 1.07.
My command (in case I am doing it wrong):
trickle -v -s -u 1 -t 1 b2 sync \
--delete \
--threads 1 \
$FOLDER_TO_BACKUP \
b2://${B2_BUCKET_NAME}/test
It looks like Ubuntu version of trickle
from apt
doesn't work very well. That bug report says you should just compile it from source, then it will work. Maybe do that, as opposed to implementing rate limiting in every single program you will ever use in a constrained environment.
Actually the best way to solve it permanently would be to bug Ubuntu to fix their trickle
to work with python3.
I've had a bit of luck with the Ubuntu packaged version of Trickle by setting the number of b2 threads to 1.
Can we get throttling control in the b2 command line tool (especially for sync)?
Thanks.