indefinite suspension of computing when changing system clock

romw commented 9 years ago

Reported by Richard Haselgrove on 1 Oct 38246665 00:53 UTC If you make a user error with the system clock in Windows XP, you can cause BOINC to stop processing indefinitely (or for a longer period than I have patience to wait).

To verify:

Set system clock 1 month forward. Note that BOINC immediately runs a benchmark.

Set system clock 1 month back (i.e. to correct time). Wait until next checkpoint for the current app. BOINC suspends computation for a benchmark, but according to doesn't actually start running the benchmark code.

Full message-log posted at Benchmarking bug - indefinite suspension of computing

romw commented 9 years ago

Commented by Nicolas on 26 Feb 38247099 15:33 UTC Wow, I really thought there was a ticket for this already.

Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it.

For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.

romw commented 9 years ago

Commented by Didactylos on 22 Aug 38247156 00:26 UTC I think there are three ways to mitigate this:

Check for time conflicts during every server interaction. This would at least log a relevant message.
Check ''every'' time against the current time looking for time-travel errors.
Subscribe to time-change events from the operating system.

Sadly, there is no quick fix. None of these methods (and really we need all of them, not just one) are particularly simple to implement.

romw commented 9 years ago

Commented by Richard Haselgrove on 10 Oct 38247229 23:33 UTC Replying to Nicolas:

Wow, I really thought there was a ticket for this already.

Well, I searched both trac and the message boards before posting, and I couldn't find it.

Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it.

For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.

I agree there are lots of problems, but this particular one seems to cause significant loss of scientific work (by halting computing) at one specific and clearly-defined point: the two or three seconds between

Running CPU benchmarks

and

[benchmark_debug] Starting floating-point benchmark

That would seem to be worth solving on its own, and shouldn't be to difficult to track down what it's waiting for.

romw commented 9 years ago

Commented by Richard Haselgrove on 21 Jan 38263431 19:33 UTC I think I've got it:

File: cs_benchmark.C Routine: cpu_benchmarks_poll Line 309:

static double last_time = 0;

If benchmarks have been run in the future (as envisioned by changeset [12128], lines 247-248), this static variable will be pre-initialised to some time in the indefinite future. The test at line 312 will always be satisfied, and the application hangs, by indefinite looping.

Solution: discard variable last_time (or set it to zero) at all possible valid exit points from the benchmarking process.

romw commented 9 years ago

Commented by Nicolas on 15 Jun 41645411 08:26 UTC The Linux kernel recently grew an interface for apps to be notified of clock changes.

romw commented 9 years ago

Commented by davea on 24 May 43204217 05:23 UTC Fixed in b6aae1c

BOINC / boinc

indefinite suspension of computing when changing system clock #566