Backblaze / B2_Command_Line_Tool

The command-line tool that gives easy access to all of the capabilities of B2 Cloud Storage
Other
544 stars 126 forks source link

Throttling for (for sync and other) #310

Open SuperTango opened 7 years ago

SuperTango commented 7 years ago

Can we get throttling control in the b2 command line tool (especially for sync)?

Thanks.

ppolewicz commented 7 years ago

Hi @SuperTango, thanks for your interest in B2 CLI. You can already impact speed to some extent by using --threads N parameter. If it is not sufficient for you, could you please describe your use case, so that we can better understand it?

SuperTango commented 7 years ago

Thanks @ppolewicz. My use case is pretty simple, I only want to use a percentage of the available bandwidth. For example, my outbound pipe from the datacenter where my Linux machine is has a max throughput of about 200kBps, however I for backups, I want to ensure we only use a max of 75kBps.

ppolewicz commented 7 years ago

It is possible to implement such a limiter, but doing it well in our environment is not easy, as we support many threads. There is no good open-source implementation of a module which would do the heavy lifting, that I could find, and I have spent some time searching for it.

Have you tried using trickle?

svonohr commented 7 years ago

This is also a feature I was looking for a while ago. I've tried trickle back then and wasn't able to limit the bandwidth. I don't know what the problem was, so maybe there is a workaround. I've just moved the backup job to the middle of the night, so it's no priority for me.

ppolewicz commented 7 years ago

I think you can also use iptables to limit bandwidth per destination. This will not allow you to set different limits if you run two sync processes concurrently.

I have researched this further and I got interested in writing something like it, just because I found lots of questions about this and no answers other than "use urlgrabber" (which is a libcurl wrapper). But first I need to deal with another challenge in b2 cli, so I'll leave it unassigned.

I don't think it is worth to implement this just for b2 CLI, but it can be made abstract enough to become useful.

If someone is going to work on this, please post here so that we can coordinate.

SuperTango commented 7 years ago

I think this is a pretty core feature for any backup (especially a sync) solution. Not flooding the network when performing a backup of potentially Terabytes of data is a requirement for me, not a "nice to have".

I haven't looked at the B2 command line tool codebase, but I've implemented a simple, yet effective throttling solution for another product I worked on a long time ago. It wasn't particularly difficult, but we were writing to sockets directly (not using a 3rd party lib). With many threads having each thread use 1/N (N = number of threads) amount of the bandwidth is good enough for this use case.

ppolewicz commented 7 years ago

Sync should be smart - if there is an upload limit and a download limit, it should maximize the usage of both resources to minimize the session time, right? If only the limits are added, then likely first the bottleneck will be on uploading and then the bottleneck will be on downloading.

Another issue is that the number of parallel uploads/downloads will change over time as new tasks are scheduled and executed. A simple 1/N would be quite inefficient when compared to a smart one.

If you would be willing to contribute some code to b2 CLI, it would be very welcome! We encourage outside contributors to perform changes on our codebase. Many such changes have been merged already. In order to make it easier to contribute, core developers of this project:

rwky commented 6 years ago

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

devhen commented 4 years ago

Does b2 download_file_by_id use threads as well? I'm using it to get specific versions of files and it saturates my bandwidth and sometimes causes issues. I will try @rwky's trickle example. Are there any plans to implement --threads N on b2 download_file_by_id? Thanks!

ppolewicz commented 4 years ago

in the current version it uses threads to parallelize downloads (it's required by b2 integration checklist), however the number of threads is not changeable from the CLI yet.

The uploading/downloading machinery in b2sdk is being reworked as we speak and one of the many improvements will be the ability to change the number of upload and download threads, or maybe even provide native bandwidth limiters, as a bit more global settings, so that you can tweak it for download, upload, sync, copy and metadata operations (sync internally listing the contents of the bucket also consumes bandwidth). Bandwidth limiting is not planned in the initial scope of the rework, but the new structure of the code goes a long way towards enabling it.

Addvilz commented 3 years ago

We really need to have this in b2 CLI directly.

I see some people are suggesting trickle here, however, note that trickle does NOT work with Python 3.x, only Python 2.x. You can not use trickle to limit bandwidth utilization of python3 scripts, it will transparently fail.

Edit: for those looking for some kind of solution, if you can throttle NIC of the host doing uploads, you can do so for the duration of the upload, however, this is only valid when you have nothing else running on the host. And, this is not really a solution to this issue per-se.

programster commented 2 years ago

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

That doesn't appear to have any impact for me. Is it possibly related to trickle has no effect on Python 3 scripts? I am using Ubuntu 20.04 with B2 version 3.2.1 and trickle version 1.07.

My command (in case I am doing it wrong):

trickle -v -s -u 1 -t 1 b2 sync \
  --delete \
  --threads 1 \
  $FOLDER_TO_BACKUP \
  b2://${B2_BUCKET_NAME}/test
ppolewicz commented 2 years ago

It looks like Ubuntu version of trickle from apt doesn't work very well. That bug report says you should just compile it from source, then it will work. Maybe do that, as opposed to implementing rate limiting in every single program you will ever use in a constrained environment.

Actually the best way to solve it permanently would be to bug Ubuntu to fix their trickle to work with python3.

blewa commented 2 years ago

I've had a bit of luck with the Ubuntu packaged version of Trickle by setting the number of b2 threads to 1.