Closed tserong closed 3 years ago
Please ignore the lint error E: 1, 0: Module file has the wrong file permissions(expected 0o644): 0o664 (file-perms)
-- it's been like that forever, along with all the other modules files, and I'm not changing it now.
Thanks, I'll try to rework that slightly so it's less confusing. With the current implementation, if you set 240, it will actually run for slightly over 255 seconds, not 127.5, but I had to instrument it with a couple of calls to time.monotonic() at the start and end, then run it myself to verify that, which definitely means it's too confusingly implemented ;-)
I've reworked it somewhat, but TBH I'm not sure if it's less confusing or not :-/
The diff is a bit ugly - it's easier to view side-by-side/split than unified.
Note that I've changed the behaviour so the sleep interval only doubles until you get to a minute, then it just sleeps for a minute on each iteration, rather than doubling indefinitely.
If the rework looks sane, I'll squash the commits and update the commit messages properly.
(lint failure is file perms again and should be ignored for this PR)
Thanks @trociny! I've used delay = min(delay * 2, 60)
because I think that's neater, and I've also squashed the commits and fixed up the commit message.
(That last force push was just a rebase)
When restarting OSDs, we wait for them to become available before proceeding to the next OSD. Previosly, this wait was in a loop with a timeout of 60 seconds. This commit increases the default timeout to 120 seconds, and also makes it configurable in case that's still too short in some circumstances. If you need a different timeout, specify it in
/srv/pillar/ceph/stack/global.yml
, e.g.:Note that the way the loop is implemented, the sleep in the loop doubles on each iteration. The runtime of the loop will thus be pretty close to a power of 2, e.g.: 1, 2, 4, 8, 16, 32, 64, ... seconds. It won't stop when the exact timeout is reached, rather it will stop at approximately the next power of 2 seconds greater than the specified timeout.
Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1192520 Signed-off-by: Tim Serong tserong@suse.com