Closed Nathan-Stohs closed 9 years ago
I think developers who has examined the Timer subsystem are frustrated. I know that I have been frustrated ever since I was forced to debug it while looking into something else.
Please do me a big favor and use full function names and at least short file names, otherwise I don't know what the parent post references. For instance, do you mean:
CPU_Timer_SetCompare(Timer,CompareValue)
TIM_SetCompare3(TIMx, Compare3)
STM32F10x_AdvancedTimer::SetCompare(compareValue)
HAL_Time_SetCompare(CompareTicks)
Thank you for referencing the commit used for the tests.
Please do me a big favor and use full function names and at least short file names, otherwise I don't know what the parent post references. For instance, do you mean:
Point taken. I meant STM32F10x_AdvancedTimer::SetCompare(compareValue)
(the third bullet)
The commit that started the spam seems to be this -- 3d27579. I am looking into the cause and a possible fix. Right now I am not sure what the cause is, but just wanted to update everyone.
I may have already fixed it in the bugUninit branch. Do you still see it in the bugUninit branch? In order to solve #243 'Programs with COM2 and with CSMA stop', I had to work through a lot of virtual timer issues last night.
The bug is in STM32F10x_AdvancedTimer::SetCompare
in file netmf_advancedtimer.cpp
. After TIM2 is set with the upper 16 bits of target, there is a check to verify if TIM2 was missed. This check fails for the very first time when VIRT_TIMER_TIME
is set with a compareValue
of 2^32. So TIM1 is set with the lower 16 bits of target and this causes the 8 ms timer spam that lasts for about 2.25 seconds. The fix for this would be to set TIM1 only if TIM2->CNT is not zero. However, I am not sure if this check is robust enough. Btw, the bug exists in the bugUninit branch also.
SPECULATION:
Actually I have a guess. Is the VT passing down compare values that are actually >= 2^32, instead of passing down at most MAX_TICKS
or whatever (0xFFFFFFFF)?
Furthermore I'm guessing that the MAX_TICKS (the large default idle value) that the VT passes to SetCompare is actually somehow MAX_TICKS + HAL_CurrentTicks()
and so the time % 32-bit wraps around to the current time, and make the driver think that "large time in the future" is really "now".
This goes back to how VT is schizophrenic on if the adtim is 32-bit or 64-bit and tries to do both, and who actually keeps track of the "real" 64-bit time and 32-bit overflows. Specifically, here the VT Is wrapping time at 32-bit but expecting a 64-bit compare to work, so we have mismatch.
To fix, do one of:
Yes, the VT does pass a 64-bit time instead of MAX_TICKS. At the beginning of STM32F10x_AdvancedTimer::SetCompare we check whether the 64-bit time we pass is less than "now" which is a 32-bit time. There will be problems at that point once the passed time is larger than a 32-bit max value. That might be the problem I am tracking down right now different than the VT spam issue.
The code isn't supposed to ever expect a 64-bit time to work. The code (currently) expects the upper 32-bits to be ignored after checking whether or not we need the "TIME_CUSHION".
I also bet "TIME_CUSHION" is the time of one of the VT spam.
I'm testing a fix now for the VT spam and my other timer issue that occurs after the 32-bit timer wrap around.
The "epoch" code in the adtim that I have up in the pull request handles all 64-bit in the adtim, but we'd need to get the VT on board.
Why does the check miss?
When STM32F10x_AdvancedTimer::SetCompare
is called with 2^32, the result of this operation is 0.
tar_upper = (compareValue >> 16) & 0xFFFF
TIM2 is then set with 0. The if statement in the check is as follows:
if (TIM2->CNT != tar_upper || TIM_GetITStatus(TIM2,TIM_IT_CC1) == SET )
Both TIM2->CNT
and tar_upper
are 0 and the status of TIM2 is RESET (not sure why, but found this while debugging. It always is RESET) and the check fails and TIM1 is set.
Why does this happen more than once?
As soon as the first check fails, TIM1 is set, which invokes VTCallBack (from ISR_TIM1 callback), which in turn invokes SetCompare
. TIM2->CNT is equal to tar_upper and so the check fails, TIM1 is set again and the cycle repeats.
When STM32F10x_AdvancedTimer::SetCompare is called with 2^32, the result of this operation is 0.
Right, and the problem is that the VT, which assumes that the adtim is 32-bit, is passing in 64-bit values (or at least, something greater than 0xFFFFFFFF). So these assumptions are conflicting. See my data dump above, the VT is passing down, for example, 4295033065, which is > 2^32, which doesn't grok with MAX_TICKS == 0xFFFFFFFF
But its actually more sinister than that, its not just passing in 2^32 (or 2^32-1), its passing in 2^32+current time, so the timer doesn't just wrap sometimes, it always wraps.
The adtim, because it is 32-bit, throws away the upper bits, and subtracts 2^32, so we end up with 2^32 + currentTime - 2^32 which of course equals the currentTime, and the adtim driver thinks that means "fire now" and we move on to setting TIM1.
I actually think the adtim driver should be 64-bit, but I don't think I'm going to get my way there, at least not right now. But the important thing is that the VT and the adtim need to agree on what it is. The current adtim driver is 32-bit, but the VT tries to treat it as both 32-bit and 64-bit, which is causing the failure here.
If we want to wait for max time that the 32-bit counter can do (say 0xffff ffff) and current time is 0x0100 0100 then we would set the compare for (0x0100 00ff) right? In that case our current code would think we "missed" and we set the lower for 0x00ff which goes off in 8.19 ms.
If we want to wait for max time that the 32-bit counter can do (say 0xffff ffff) and current time is 0x01000100 then we would set the compare for (0x100 00ff) right? In that case our current code would think we "missed" and we set the lower for 0x00ff which goes off in 8.19 ms.
No, you wouldn't set it to that. You always have to break at 0xFFFFFFFF. Remember, you are only looking at the bottom 32-bits. If the adtim is given a compare request for now-1
, it has no way to know if that request is for a timer that missed or a wrap. So again, its a 32/64 bit issue.
So an idle value time (a very big number) is passed for us to trigger on. Our time is 0x0100 0100 (looking at only the lower 32-bits of the system time). What do we set the upper 16-bit timer to?
0xFFFF
Check ba516cf2d3fe50d12cdd93448c3ace1562b705f3. I believe it fixes the spam problem and a few other bugs.
Looks fine, haven't tried it.
FYI what I believe you have implemented is effectively a 33-bit timer, with the MSB in software.
Fixed by aa6ba2f2bf3c8f8cee0da9e46d42803dec6940d0
Running Chris' app in #282 (with 800ms timers instead of 100ms), I logged all
SetCompare()
requests at the timer low-level to memory from start of TinyCLR (no TinyBooter) until the program gets deadlocked in the execution loop. Then I dumped the array with GDB. This was in debug mode with-O2
.I don't know if this is related to #282 , I think it is, but either way this needs attention. Data Formatting is difficult in this medium, I have a spreadsheet if desired.
First column is the actual compare time requested. Second column is difference from last time requested. Last column is the same delta but in milliseconds.
Unfortunately I did not log the current time which could have been illuminating.
Time starts at 0, so all of these requests for times of 4 billion+ are pure garbage. Actually, only 5 of the 317 calls are directly attributable to the program (C# user timers). The rest is junk, minus perhaps a few OS requests.
I was running the logic analyser at the same time, and I counted 5 transitions on the IO. There are exactly 5 lines similar to:
Which I assume are the 800ms user timers, and the only timers that should be running.
It is extremely difficult to debug other problems until this mess is taken care of, and this is almost certainly where our performance bleed is coming from.
After the last timer, no further timer requests are made. It is unknown what happens at the end, if we lose a callback, lose a time, or we never set it up for other reasons. When we deadlock, it is in the C# interpreter, not in sleep or otherwise apparently involving the timer. Difficult to say what its waiting for.
I have not looked at other optimization levels. Likely they look different.
EDIT: This was in master on 4eb63cb926feb01946afa07515e6d8438a45af7b