chaos / powerman

cluster power control
GNU General Public License v2.0
43 stars 19 forks source link

redfishpower: reduce polling interval in test mode #155

Closed chu11 closed 8 months ago

chu11 commented 8 months ago

Problem: When in test mode, the status polling interval is still set to 1 second. This can slow down tests.

Reduce the status polling interval to 1 millisecond when in test mode.


I yoinked a commit from #73 that allows setting the status polling interval for this.

chu11 commented 8 months ago

re-pushed, i put a refactor/cleanup commit in here as well.

garlick commented 8 months ago

It seems like the more significant part of this PR is making the status polling interval configurable, but the title makes it sound like a test-only change.

Isn't this an ideal case for an exponential backoff? E.g. start at a min interval and double the interval each time, until a max interval is reached. Keep going at the max interval forever until powerman kills you out I guess. For example see: https://github.com/flux-framework/flux-core/blob/master/src/common/librouter/usock.h#L22 https://github.com/flux-framework/flux-core/blob/master/src/common/librouter/usock.c#L675

You could still make those values tunable but it may not be necessary to re-tune for new hardware or new firmware.

chu11 commented 8 months ago

It seems like the more significant part of this PR is making the status polling interval configurable, but the title makes it sound like a test-only change.

Ahhh, that's what I get for stealing a commit from another PR. Let me re-work to make this just about this the test_mode speed.

Isn't this an ideal case for an exponential backoff? E.g. start at a min interval and double the interval each time, until a max interval is reached. Keep going at the max interval forever until powerman kills you out I guess.

That's a really good idea. I think we'll leave that for another PR as I would want the exponential backoff to begin at around 1 second vs 1 millisecond. So far from testing it's clear there are some nodes that power on after 3-4 seconds (so maybe 1->2->3 second backoff would be ideal?) and then those that take 20-50 seconds (so maybe capping at 5 seconds would be fine).

chu11 commented 8 months ago

re-pushed, greatly simplifying the PR to just reduce polling interval for test mode.