aerospike / act

Aerospike Certification Tool
www.aerospike.com
Other
188 stars 50 forks source link

Sleep Time Modification for Large Block IO Threads #30

Closed kpmckay closed 7 years ago

kpmckay commented 7 years ago

This modification makes the sleep time calculation tolerant to IOs that do not complete within the target time calculation.

gooding470 commented 7 years ago

The calculation should be ok as-is. If we have: int z = (int)(x - y); where x and y are unsigned, and x is less than y, while (x - y) yields a large unsigned value, the int typecast makes z negative.

Of course in our case x and y are unsigned 64-bit numbers, which makes it more involved.

If x and y are unsigned 64-bit numbers, and x is less than y, and the difference is more than ~2 billion (max signed int), z would be a large positive number. So it would be safer if z (our sleep_us) was declared an int64_t. But the difference corresponds to ~2 billion microseconds, or ~2000 seconds, which should be impossible.

Finally, the suggested fix is itself dangerous because it calls the clock twice. The first call may result in (target > (t1 - start)) being true, then for the sleep call, (target - (t2 - start)) may result in a large positive number, because (t2 - start) is now bigger than target, where (t1 - start) was not.

kpmckay commented 7 years ago

Understood about two calls to the timer being bad. Let me trace a run to see exactly where a large value is being passed to usleep.

gooding470 commented 7 years ago

You could try declaring sleep_us as an int64_t (signed), in the original code. That would take the possibility of target being less than (now - start) by more than ~2000 seconds out of the picture. Though it's hard to see how that would happen.

kpmckay commented 7 years ago

I don't think it's the ~2000s issue either, but I'll give this a shot as well. Thanks for the feedback. It will take a few days to get the data.

kpmckay commented 7 years ago

Adding some tracing shows that the actual issue is the accumulation of a large negative value in sleep_us. The target_us value grows slower than cf_getus()-start_us such that sleep_us eventually flips from negative to positive. Here are the values at the time of an overflow event:

count = 22742770
start_us = 504077123164
cf_getus() = 515890284479
cf_getus()-start_us = 11813161315
target_us = 9665677250
target_us-(cf_getus()-start_us) = -2147484065 (FFFF_FFFF_7FFF_FE5F)
sleep_us = 2147483231 (7FFF_FE5F)

In the sample immediately prior:

target_us = 9665676825
target_us-(cf_getus()-start_us) = -2147483407 (FFFF_FFFF_8000_00F1)
sleep_us = -2147483407 (8000_00F1) 

I'm not sure how ACT is supposed to react in this case, but to avoid the sleep timer from tripping I suppose that sleep_us can be kept in an int64_t until the cast into the usleep function.

kpmckay commented 7 years ago

I'm going to close this request since the original proposal was invalid.

gooding470 commented 7 years ago

We'll change the ACT code to use an int64_t instead of int for sleep_us. (Or, if you want to make a pull request for that, we'll take it...)

kpmckay commented 7 years ago

I'm just going to let you make the change since I keep screwing up these pull requests!

On Thu, Oct 12, 2017 at 5:36 PM, gooding470 notifications@github.com wrote:

We'll change the ACT code to use an int64_t instead of int for sleep_us. (Or, if you want to make a pull request for that, we'll take it...)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/aerospike/act/pull/30#issuecomment-336318384, or mute the thread https://github.com/notifications/unsubscribe-auth/AUNAyGaKj7tumNbMUwwir3M9JQJvQn-Aks5srrCkgaJpZM4Py91Y .

gooding470 commented 7 years ago

Ok, thanks for finding this! We'll change it.

gooding470 commented 6 years ago

Hi, just letting you know that -- finally -- the fix for this is checked in. In addition to switching to int64_t, we also decided to stop the test long before the large block operations fall that far behind the target rate. We'll now stop the test when we fall 10 seconds behind the calculated target. If we fall that far behind, it essentially means the drive(s) can't keep up at the simulated write & defrag specification.