Closed ArtemGr closed 1 year ago
Are there 2 panic messages that overlap each other? :slightly_smiling_face:
thread 'COREthread '' panicked at 'pricesattempt to subtract with overflow' panicked at '', attempt to subtract with overflowmm2src/coins/utxo/rpc_clients.rs', :mm2src/portfolio/portfolio.rs:7231177:24
There 2 stack traces that are actually related to time difference calculation:
portfolio.rs - if now_ms() - last_price_broadcast > 10000
rpc_clients.rs - if now_ms() / 1000 - last as u64 > ELECTRUM_TIMEOUT
I expect that this can happen only if system time went backwards for some reason, is it?
cc @Kiruel
> I expect that this can happen only if system time went backwards for some reason, is it?
Looks like this might be possible.
There's an account of this code being too complex to be reliable:
"I worked on the kernel clock functions for a while. The way it's supposed to work is that the MONOTONIC clock never goes backwards. That's sort of by definition. In practice, MONOTONIC is implemented as an offset from REALTIME. Any time the realtime clock is adjusted, or the system sleeps, this offset is adjusted. Under the wrong circumstances, the offset could be miscalculated, resulting in a jump -- even a backwards jump -- of the MONOTONIC clock. The built-in realtime clock in most hardware only provides time to the nearest second, while Linux keeps time internally to the nearest microsecond or nanosecond. Linux tries to resolve these issues when going to sleep and waking up, but the logic is quite convoluted and I don't understand it. Add in adjustments to REALTIME caused by NTP adjustments, user time adjustments, and who knows what else, and you have a recipe for chaos. Bottom line is that I'm not really surprised that MONTONIC sometimes goes backwards." – Edward Falk, https://stackoverflow.com/questions/3657289/linux-clock-gettimeclock-monotonic-strange-non-monotonic-behavior
And there's at least one known issue: https://stackoverflow.com/a/3657433/257568. There might be kernels having it in the wild.
And we're not even using the MONOTONIC
clock, we're using the normal UNIX time which has no guarantees, buggy or otherwise, of not going backwards.
We should probably add a monotonic clock function to common
, for algorithms requiring the monotonic time. Simply checking against a global atomic integer might be simpler and more reliable than using the kernel's MONOTONIC
clock. Thoughts?
@ArtemGr Adding MONOTONIC
clock function to common
will help us with our own code that we have good knowledge and full control of. But can we be sure that all dependencies that we rely on would work well if there is gap in system time for any reason? What if MM2 is started with incorrect
system time, but it changes to correct then, does it mean that our monotonic clock will return incorrect time in such case? Moreover checking time correctness might be even more important, bad things can happen if time is not correct from the beginning and MM keeps using it.
Summary:
Some distributed systems depend on clock time synchronization, while others use their own virtual clocks (1, 2, 3, ...) to determine the ordering of the distributed operations. The latter ones would see no breakage from a diverging clock time. I'm curious, do we use anything specific that requires clock time synchronization? Electron? Blockchains?
Anyhow, loading our own reliable time when mm2 starts is a nice idea. But we should probably track it as a separate issue. Specifically, incorrect clock time doesn't trigger the overflow panic with which we are concerned here (the time might be wrong but monotonic). And we can have a correct clock time (withing the reasonable +1/-1 minute marging) on a phone and yet still see this bug triggered due to kernel changing the time in a non-monotonic fashion.
Our code should work fine then, but would 3rd party work?
I don't think we have a lot of choice here apart from removing such failing dependencies or fixing them to use a monotonic time function or into being lax towards the time going backwards.
Stack trace: https://gist.github.com/Kiruel/00e52d08328ca8f097111a62977b32ab. Discord original. The source code is probably at bb94f1b548c68e6b9fead3f1505c8f963878dee1 (in the mm2-cross branch).