Open gibmat opened 7 months ago
Thanks, we'll keep an eye on it. Unfortunately our tests are not well-isolated so this kind of flakiness is hard to squash entirely. If if happens again then LIBDQLITE_TRACE=1
output would be useful in figuring out the culprit.
Yeah, I haven't been able to reproduce this exact failure myself. Would adding LIBDQLITE_TRACE=1
when running the tests have any adverse side effects? If not, I'll just add it to the normal packaging rules so it's always there to help with future debugging.
No, there shouldn't be any adverse effects.
I'd say that the problem is that those tests are time-dependent and not deterministic. I would argue that in this kind of unit/integration test suites that are ran at package build time there should be only deterministic and time-independent tests.
From what I've seen, this (and other) tests just fail because the hard-coded timeouts or time expectations don't match the capacity of the underlying hardware.
While adding LIBDQLITE_TRACE=1
might help figuring exactly what timing is wrong, I think the most robust solution would be to rewrite those tests, because tweaking timings is intrinsically fragile.
I've enabled LIBDQLITE_TRACE=1
when running the tests, and it will be included whenever the next upload of dqlite is made to unstable.
During a recent rebuild of dqlite 1.16.4 (using bundled libraft) on an arm64 host, one test failed. It's likely this is a flaky test: