Samraksh / eMote

eMote OS -- Multiple Ports (using .NET MF v4.3)
0 stars 0 forks source link

Odd timing results for RealTime and standard timers #233

Open WilliamAtSamraksh opened 9 years ago

WilliamAtSamraksh commented 9 years ago

I'm getting some odd results for the RealTime and standard timers. The program exhibiting this is in the share:

Dev eMote\GitHub Issues Attachments\2015.01.22 Timer Jitter.zip

(Don't get confused by another zip file of the same name but an earlier date.)

The timers are each set to tick at 10 millisec (10,000 microsec) intervals. The callback toggles a GPIO pin as soon as it's invoked. Here are some of the intervals calculated from the logic analyzer trace file; the file is "Logic - timer jitter.xlsx" in the zip archive. Times are in ms (millisec).

RealTime Timer

image Notice that the first interval is reasonable, about 10 ms. The next few are large, 25-27 ms. After that it settles down to about 13 ms, with some that are at least 16 ms (not shown here). The initial variation seems odd as are the occasional large intervals that follow.

Standard Timer

image Here we see a similar pattern, with a low (and reasonable) initial interval, several large ones, then settling down to about 10 ms. Oddly, the latter intervals are closer to the desired 10 ms than for the RealTime timer. A scan of the subsequent intervals shows little change.

So ... Why the the stretch of large intervals in both cases? And why isn't the RealTimer giving intervals closer to 10 ms?

KennethAtSamraksh commented 9 years ago

Because its broken.

Worse than 0.01 ms of error is a bug.

Sent from my Verizon Wireless 4G LTE smartphone

-------- Original message -------- From: William Leal notifications@github.com Date:01/22/2015 5:14 PM (GMT-05:00) To: Samraksh/MF MF@noreply.github.com Subject: [MF] Odd timing results for RealTime and standard timers (#233)

I'm getting some odd results for the RealTime and standard timers. The program exhibiting this is in the share:

Dev eMote\GitHub Issues Attachments\2015.01.22 Timer Jitter.zip

(Don't get confused by another zip file of the same name but an earlier date.)

The timers are each set to tick at 10 millisec (10,000 microsec) intervals. The callback toggles a GPIO pin as soon as it's invoked. Here are some of the intervals calculated from the logic analyzer trace file; the file is "Logic - timer jitter.xlsx" in the zip archive. Times are in ms (millisec).

RealTime Timer

[image]https://cloud.githubusercontent.com/assets/1988609/5865560/5b62fdae-a258-11e4-8757-4323587d1d2e.png Notice that the first interval is reasonable, about 10 ms. The next few are large, 25-27 ms. After that it settles down to about 13 ms, with some that are at least 16 ms (not shown here). The initial variation seems odd as are the occasional large intervals that follow.

Standard Timer

[image]https://cloud.githubusercontent.com/assets/1988609/5865636/da88b33a-a258-11e4-9b71-d8ab8dfdc59f.png Here we see a similar pattern, with a low (and reasonable) initial interval, several large ones, then settling down to about 10 ms. Oddly, the latter intervals are closer to the desired 10 ms than for the RealTime timer. A scan of the subsequent intervals shows little change.

So ... Why the the stretch of large intervals in both cases? And why isn't the RealTimer giving intervals closer to 10 ms?

Reply to this email directly or view it on GitHubhttps://github.com/Samraksh/MF/issues/233.

Nathan-Stohs commented 9 years ago

Iterations 2-11 of Bill's test being way off is a direct result of the C# debug prints that are done for the first 10 iterations. Disable this, specifically the C# line #define LoopTrace and its rock solid. If you remove the first 11 loops from consideration, or just remove the #define, then the stddev between real time events is less than a microsecond (seems good) and max-min = 5 microseconds across 2989 iterations (again, seems good). These times are directly measured from the Logic analyzer.

One failure mode would be stuffing too much stuff inside an interrupt that runs repeats too fast, in other words the execution time on your callback is longer than your interval. I could easily imagine this happening with UART debug prints, as they along with string operations are pretty expensive, along with MF just being slow in general.

However digging in a little deeper that doesn't seem to be the case here. From my testing, when a repeating RT timer, say 50ms, is set with a callback, what you get is not a 50ms timer but really a 50ms+C timer where C is the execution time for the callback itself. In other words the underlying timer is being set as a series of one-shots that are not even close to back-to-back. For instance I can add an empty for loop to increase C, and thus the callback interval, as much as I want. This is a serious problem if not known (I assume it was not). In the meantime, for single-shot events, the RT seems to work as advertised.

I will have to go back and verify when if ever this was not the case. The recent VT-RT integration would be a natural suspect however Bill's testing was on R11.2 which presumably was before that.

EDIT: My best guess as to why testing didn't catch this would be that existing tests used callbacks of trivial size (say just toggling a GPIO) and so the error (C) that did occur was small enough to be negligible or written off as some offset. Tests should vary the callback execution time.

mukundansridharan commented 9 years ago

While it is true that a series of single shot hardware timers are used to run the driver, this does not mean that what we get is actually a timer_value + C# handler_run_time. While the real-time does not try to compensate for delays in native code, it sets the timer for the next event before it call the C# handler to execute the code. There fore if the handler finishes before the next event fires, you get a perfect the timer_val + some_native_code_delay. That is, if the C# handler takes 4ms and the RT timer is set for 5ms, you get perfect 5ms executions.

The behavior above happens because of the PendSV interrupt having the lowest priority. While the RT C# callback code runs in the interrupt mode it does not run under the high priority timer interrupt, it runs under a semi-soft PendSV interrupt which I believe is set to have the lowest priority. So while the C# callback of RT quickly returns, debug text sent over usart is queued up in the PAL buffer and now it has priority over further PendSV interrupts. While the realtime hardware timer itselft runs on time and keeps triggering the PendSV, the PendSV handler does not fire becuase of usart data and subsequently many of the PendSV interrupts gets lost. And when finally PendSV finds a gap in usart it runs only queue more usart and the cycle continues.

Nathan-Stohs commented 9 years ago

I too initially thought that it was a priority issue at first, however I can eliminate the UART from the equation.

For example the following C# code, setup as a 100ms RT callback:

            public static void OnTick(DateTime time, object timer) {
                Globals.Globals.GpioJ12P1.Write(true);
                for (var i = 0; i < 400; i++) ;
                Globals.Globals.GpioJ12P1.Write(false);
            }

When measured with the Logic8 will show a callback-to-callback (high-to-high) time of 110ms. for loop time (as measured from GPIO) was 7ms.

but

            public static void OnTick(DateTime time, object timer) {
                Globals.Globals.GpioJ12P1.Write(true);
                //for (var i = 0; i < 400; i++) ;
                Globals.Globals.GpioJ12P1.Write(false);
            }

Has a callback-to-callback time of 103ms, which is 110ms - 7ms. So it looks to me like the timer is getting set after rather than before the C# execution time. Other overhead here may also explain the remaining 3ms error, not sure. Strangely, the non-RT timer does not have this problem, at least in master.

Both cases above have identical (low) jitter.

EDIT: My point here isn't to argue, I just needed to be sure of what I should be looking for when tracing backwards in history and that in fact this was incorrect behavior. So now I can proceed to see if we have a build that didn't do this.

Nathan-Stohs commented 9 years ago

I have saved my C# code showing the bug in dropbox GitHub Issues Attachments folder. This is just a reduced version of Bill's jitter code.

I have tested my code as far back as R11 (circa July 2014) and the bug is still present. This is before the VT merge so that seems unrelated so far. If anybody has any other suggested points in time please let me know and I'd be happy to build and test. Possibly it never worked.

This bug has moved out of my planned scope so I'm going to have to kick it upstream. Digging into this one isn't my mandate right now, I have other tasks and the bandwidth I set aside here was only intended to help polish R12.

            public static void OnTick(DateTime time, object timer) {
                Globals.Globals.GpioJ12P1.Write(true);
                for (var i = 0; i < 800; i++) ;
                Globals.Globals.GpioJ12P1.Write(false);
            }

I'm attaching two images to help explain the issue. Both use my code snippets from above, e.g. just a RT callback with a GPIO toggle, with and without a for busy-wait loop. The RT interval is 100ms. These particular tests were done on R11.0 (so no VT).

rtbug2 This is without the for loop, so the callback is almost empty. The width is about right, almost 100ms, which gets even closer when you add in the toggle time itself. This is the happy case when the callback is basically empty.

rtbug1 This is with for (var i = 0; i < 800; i++) ; in between the toggle. Notice that the GPIO low-to-high time is almost 100ms, but the high-to-high time (which is the actual RT interval) is obviously much larger (146ms). Again, at face value this suggests that the RT is being reset at the wrong time, at the end of the callback instead of the beginning.

Note that the GPIO toggles show the loop time is about 40ms, meaning we are not violating the execution time < RT interval rule.

ChrisAtSamraksh commented 9 years ago

This is what the problem is:

Originally the realTime timer was written for a 16-bit hardware counter. So at 8MHz we can have interrupts up to 8 ms or so for a max set compare. Obviously this would limit a user greatly so the realTimer timer would set a timer interrupt for under 8ms and then keep track of how many times we got interrupted until we reach the time the user wants to fire the real time timer C# user callback.

All of this is unneeded with the Virtual Timer code.

What was happening is that a 50 ms RealTime timer was having to have its real time native callback routine called, say for example 7 times, and then after those 7 native callbacks the C# user callback would be called. The VT was firing the first of 7 native callback appropriately, but wasn't able to do any of the other 6 native real time callbacks appropriately if the C# user callback was taking a very long time (relatively speaking).

So basically we rip out the need for all of the multiple real time native callbacks because the Virtual Timer takes over that functionality and we should be able to handle long C# user callbacks (shorter than the real time timer period of course).

ChrisAtSamraksh commented 9 years ago

The real time timer no longer keeps track of roll overs itself and lets the virtual timer code handle that. Please re-test with 6a301b76d194618eb8315a84ff68e879ff7d970a.

The virtual timer only will work up to 8min 56 seconds at 8MHz, and since the real time timer runs through that code now, the longest real timer timer interrupts we can generate max out at that time period at 8MHz. C# timers can go for much longer periods of time.

WilliamAtSamraksh commented 9 years ago

My latest test shows that things seem to be OK. Here are the results for 3000 samples. I included the option for make-work threads which consume CPU cycles. The results are shown for no threads and 1 thread. image There's an anomaly that I don't have an explanation for. The interval measurements use the DateTime value provided as an argument to the RealTime timer callback. The callback toggles a GPIO pin. Using the DateTime value, some intervals are greater than 100 microsec. Using LogicAnalyzer output, none are. Here are the LogicAnalyzer results: image

WilliamAtSamraksh commented 9 years ago

Oops, closed by accident.

WilliamAtSamraksh commented 9 years ago

In a run yesterday of the timer jitter profiler, I got a large callback interval for the RealTime timer. Notice the max value of 108 microseconds.

image

Threads gives the number of make-work threads running in the background. The "Debug Log" section gives the intervals as measured by the difference in the DataTime value passed to the callback method; the "Logic Analyzer" section gives the intervals as measured by the logic analyzer.

The full spreadsheet is in the share

Dev eMote\GitHub Issues Attachments\2015.03.05 Timer Results Analysis.xlsx

ChrisAtSamraksh commented 9 years ago

Does this happen every test? Which build is this based on?

WilliamAtSamraksh commented 9 years ago

The difference has shown up each time I've used the logic analyzer (probably 4 times so far).

.NetMF v4.3_1.0.11 EmoteDotNow, Build Date: .NetMF v4.3_1.0.11 12:51:21 EmoteDotNow, Build Date: Mar 4 2015 11:28:18 GNU Compiler version 4.7.3 Optimization: speed-optimized, no-size-optimization TinyCLR (Build 4.3.0.0) Software ID: 74dfe6d72b983cb57707c5ec9e7cf9f66ab84d68-modified by: Leal Starting... Created EE. Started Hardware. MSdbgV1m▒a ▒▒Create TS. Loading start at 8076518, end 80882bc Assembly: mscorlib (4.3.0.0) Assembly: Microsoft.SPOT.Native (4.3.0.0) Assembly: Microsoft.SPOT.Hardware (4.3.0.0) Assembly: Microsoft.SPOT.Hardware.PWM (4.3.0.0) Assembly: Microsoft.SPOT.Hardware.SerialPort (4.3.0.0) Assembly: Microsoft.SPOT.Hardware.Usb (4.3.0.0) Assembly: Microsoft.SPOT.Hardware.OneWire (4.3.0.0) Loading Deployment Assemblies. Attaching deployed file. Assembly: Microsoft.SPOT.Graphics (4.3.0.0) Attaching deployed file. Assembly: Globals (1.0.0.0) Attaching deployed file. Assembly: Samraksh.AppNote.Utility (1.0.0.0) Attaching deployed file. Assembly: Samraksh_eMote_DotNow (1.0.0.1) Attaching deployed file. Assembly: Samraksh_eMote_RealTime (4.3.0.12) Attaching deployed file. Assembly: Samraksh.AppNote.TimerJitter (1.0.5541.26465) Resolving. Ready.