adamchainz / dynamodb_utils

A toolchain for Amazon's DynamoDB to make common operations (backup, restore backups) easier.
ISC License
12 stars 1 forks source link

Fix throttling #1

Closed fr3nd closed 9 years ago

fr3nd commented 9 years ago

Throttling was always set to 1.0 because the max function was used instead of min

adamchainz commented 9 years ago

Thanks for your PR! I'm no longer using DynamoDB and these utils, so it's not going to be easy for me to test this (the tests also against DynamoDB Local) - even though it's a quite small change I always want to be sure. Have you confirmed this fix with dumps you've run, and how? Thanks :smile:

fr3nd commented 9 years ago

I still haven't been able to make throttle to run properly (did it ever work for you?), but in this case it's easy to see that "min" is the function you want to use.

If you leave it as "max", it will always return 1.0 because the allowed value is always between 1.0 and 0.01 (see lines 80 and 81):

    capacity_consumption = max(0.01, capacity_consumption)
    capacity_consumption = min(1.0, capacity_consumption)
adamchainz commented 9 years ago

I don't remember whether or not it worked for me.

I see that the change to min is at least required, but maybe there are other things to fix? Will look into testing it better.

adamchainz commented 9 years ago

I think the throttling code is right before this change. Yes capacity_consumption is constrained between 0.01 and 1.0 but that is a percentage that is mutliplied out in (capacity_consumption * total_capacity) / float(parallelism).

Running some numbers by in ipython with some test code we get, e.g.:

In [1]: # imaginary commandline parameters

In [2]: capacity_consumption = 0.5

In [3]: parallelism = 4

In [4]: # adapted from main()

In [5]: capacity_consumption = max(0.01, capacity_consumption)

In [6]: capacity_consumption = min(1.0, capacity_consumption)

In [7]: total_capacity = 1000

In [8]: capacity_per_process = max(
   ...:     1.0,
   ...:     (capacity_consumption * total_capacity) / float(parallelism)
   ...: )

In [9]: print(capacity_per_process)
125.0

125.0 read units per each of the 4 processes seems right to me for consuming 0.5 = 50% of the 1000 read units. Swapping for min would only restrict each process to 1.0 read units. The reason this max exists is so there is at least 1 read unit per process.

But thanks for looking! If you have any other changes, feel free to open another PR!

fr3nd commented 9 years ago

Ok, I understand it now. Thanks for taking some time investigating and sorry for the wrong push.