Open jschlatow opened 6 months ago
Does this mysterious effect persist when restarting the VM?
If not, maybe you are observing an overflow of the free-running counter returned by Thread::_call_time
?
To rule out / detect this situation, may you consider adding a check in Timer::Device::now()
in the timer driver, remembering the current value, checking that the new value is always higher than the last one, and printing an error should this invariant be violated?
Should the effect disappear upon the restart of the VM, the issue might be more local to the virtualization interface.
Does this mysterious effect persist when restarting the VM?
Yes, it persists. Restarting the VM actually takes forever.
If not, maybe you are observing an overflow of the free-running counter returned by
Thread::_call_time
? To rule out / detect this situation, may you consider adding a check inTimer::Device::now()
in the timer driver, remembering the current value, checking that the new value is always higher than the last one, and printing an error should this invariant be violated?
I instrumented the method as you suggested. The counter overflowed once, yet this did not have any noticeable impact. I observed the aforementioned effect only about an hour later and without any additional error message.
I just noticed that I had accidentally created this issue in the Goa repository and therefore transferred the issue.
@jschlatow: I actually can confirm this. A hard restart in Leitzentrale helps. So it looks like a VBox6 problem or in one of its dependencies. As you stated: The problem requires a couple of hours to appear.
As an additional information: I don't observe any system load in top_view
nor CPU frequencies going up in msr_gui
where they remain at 800MHz or below.
I instrumented the method as you suggested. The counter overflowed once, [...]
That is very worrying because this can have all kinds of effects, ranging from the missed (or hours late) firing of timeouts to the wrapping of the values returned by Timer::Session::elapsed_us
, which is unexpected by clients. The continuity of kernel time is an ultimate assumption. Therefore, before investigating the problem anywhere else, we need to eliminate the wrapping of the time value reported by the kernel.
In my understanding, the kernel tick counter is a 64-bit value that counts at a rate of around 2 GHz. These values are converted to microseconds (timer_ticks_to_us
) using a calibration value ticks_per_ms
. This conversion must anticipate an uptime longer than a few hours, which is seemingly not the case right now.
I'll try to reproduce the wrap-around using a synthetic test.
@nfeske I haven't written down the counter values but I just remembered that I noticed the values were very close. Thinking about this, it didn't seem like a typical overflow situation.
By brute force testing I discovered that timer_ticks_to_us()
does not always yield increasing results for increasing ticks
values, for example with 0x4ffffffff
, 0x500000000
, and 500000001
as input the clock ticks backwards about 588 us. As I was not able to fix the current implementation up to now, you may just replace it by a simple return ticks * 1000 / ticks_per_ms
for testing.
@jschlatow does the suggested change above or 6b6f31190c7089ff471599b5f5ef6eeecf1ec226 have positive impact on this issue?
@chelmuth I'm still in the process of testing the fix since the issue is a bit sporadic. Without 6b6f311, I noticed that the degradation occurred with and without a preceding timer overflow and, vice versa, I have seen timer overflows without noticing any degradation. I'll keep you updated.
@chelmuth Unfortunately, 6b6f311 does not affect this issue.
The effect is still reproducible with Sculpt 24.10.2.
After almost a week of running Sculpt 24.04.1 with base-hw on my T490s, I repeatedly observed the following effect:
goa run --target sculpt
, which uploads a bunch of binaries. This works flawlessly on a freshly booted system. However, once I lost the key repeat events, the upload is slower and occasionally makes my VM unusable for a few seconds.