Closed chelmuth closed 10 months ago
I investigated this and I found solution that seem to fix the issue. Commit tomga/genode-rpi@f072edab1e5a18f3e59ceaf523c1fbfe504d52c0 contains required change. Without this test-timer
failed on each run and other tests failed randomly. With that change all depot autopilot tests run successfully few times.
Before I got to above solution, I browsed through time interpolation code and I believe I may have found an issue there, that can optimize calculations. In this line there is a variable defined that is not updated at all and it seems that the intention was to use it to quickly get out of probing loop, if current result is good enough. I think that tomga/genode@3290e35354772065cda5ac51b90b7bfe0309debb may be the proper fix. It seems to not break anything on rpi1 and during debugging caused quitting loop after 1 iteration, not 5.
@tomga I added a preliminary fixed label. I came across the missing latency_us
too, but unfortunately forgot about it. @m-stein what do you think about the change?
@chelmuth, do you plan to cherry pick this change into genodelabs/genode-rpi repository?
@chelmuth, do you plan to cherry pick this change into genodelabs/genode-rpi repository?
Yes, but currently I do not understand what is fixed by the change...
Actually, I don't understand it fully either :-)
I saw that error comes from this function and it was the only calculation that I could imagine to suffer from being performed on too short integers.
@tomga @chelmuth Thanks for pointing me to the latency_us issue! I found that this was broken already in the first version of this file and, somehow, made it unfixed through all my reviews throughout the years. As far as I can tell, https://github.com/tomga/genode/commit/3290e35354772065cda5ac51b90b7bfe0309debb is exactly what I meant to implement in 2017 and can be merged :)
Regarding https://github.com/tomga/genode-rpi/commit/f072edab1e5a18f3e59ceaf523c1fbfe504d52c0: The CLO and CMP regs are uint32, so, I can imagine that the compiler used to do the subtraction in uint32 in contrast to uint64 (time_t). This makes a difference whenever CLO < CMP which should never be the case if IRQ = 1. However, I think we have a raise at hand that should be fixed by https://github.com/m-stein/genode-rpi/commit/1a315cc4ef39a3c7508aefdf4a60c117af496624.
@tomga @chelmuth regarding the system timer implementation for broadcom, the cast of the 32-bit counter value to a 64-bit value is no good solution. It might have fixed your observation by accident only?! However, if I fill value like 0x0 for the actual 32-bit counter, and 0xffffffff for the comparision register (meaning the interrupt triggered one tick before, and the counter wrapped, with this solution the calculation gets wrong. The actual problem lies in my work on "enabling strict warnings in base-hw" from February in commit genodelabs/genode-rpi@e943f72. Here I broke the overall 32-bit calculation with correct wrapping. I'll provide a solution quickly.
Probably my change just made the issue much less frequent. Probably 2^32 times less frequent, which would be definitely enough to not see it in test run.
I've provided a fix for the issue and the potential raise reported by @m-stein. Are you all fine with it? Then I would push it into genode-rpi upstream.
I'm fine if it works on hardware.
No objections except a missing s in the comment about 32-bit registers.
@skalk, I also confirm that with this patch depot autopilot tests work and that this change looks definitely better than mine. Thank you.
@tomga thank you for your investigation and identifying the right place!
@tomga @chelmuth Thanks for pointing me to the latency_us issue! I found that this was broken already in the first version of this file and, somehow, made it unfixed through all my reviews throughout the years. As far as I can tell, tomga@3290e35 is exactly what I meant to implement in 2017 and can be merged :)
@chelmuth I assume that issue was marked as completed automatically and modification of interpolation code was just forgotten, am I correct?
@chelmuth I assume that issue was marked as completed automatically and modification of interpolation code was just forgotten, am I correct?
I intend to merge the adaption of the interpolation after the Sculpt release. The RPi commit solved the issue and triggered the close on merge.
@tomga merged 567457e3c144ebd88e05eacff7d38b17baf3a129 just now.
Thanks. I wasn't aware that Sculpt release is so close and thought that we may forget about this change if we have the issue closed.
Fixed in master.
I interpret the following logs as result of integer overflows / calculation errors in the timeout framework.