Samraksh / eMote

eMote OS -- Multiple Ports (using .NET MF v4.3)
0 stars 0 forks source link

ADAPT eMote programs eventually hang #128

Closed WilliamAtSamraksh closed 10 years ago

WilliamAtSamraksh commented 10 years ago

I've been running a variety of programs (using the 2014.04.14 version of eMote for ADAPT). In most cases, the programs are intended to run forever. However, all of these eventually seem to hang. There is no error message on the console.

The issue that console output is delayed or stops is a point of uncertainty. However, when using an LED that is supposed to toggle, I do see that the light generally turns off and stays off.

I've put "2014.05.13 ADAPT jitter profiler.zip" in the DropBox share "GitHub Issues Attachments". What runs is controlled by preprocessor symbols that are defined at the top of Program.cs. The issue can be demonstrated with pretty much any of the symbols enabled. It shows up sooner if SamplingInteervalMilliSec is kind of small (such as 4000).

MichaelAtSamraksh commented 10 years ago

This was fun. There is a comment in DeviceCode\pal\AsyncProcCall\Completions.cpp as follows:

// In case there's no other request to serve, set the next interrupt to be 356 years since last powerup (@25kHz).
HAL_Time_SetCompare_Completion( ptrNext->Next() ? ptrNext->EventTimeTicks : HAL_Completion_IdleValue );

But earlier, we see

static const UINT64 HAL_Completion_IdleValue = 0x0000FFFFFFFFFFFFull;

MSM8960's timer frequencies are 32KHz GPT1 timer and 20MHz DGT (debug timer counter currently being run at 5MHz). The program hangs at around 4-5 minutes. 256 seconds = 4 minutes, 16 seconds. KraitTIME.cpp uses a 64-bit counter value, and the APCS_DGT_CNT register is 32-bits. Register APCS_DGT_EN = 0x3 so the DGT timer resets when it reaches value 0x88 stored in Register APCS_DGT_MTCH. KraitTIME.cpp assumes APCS_DGT_EN = 0x1 and it assumes APCS_DGT_CNT rolls over at 2^32. However, because APCS_DGT_EN = 0x3, the DGT timer rolls over every 27.2 microseconds. And the rollover is detected by software that increments the 33rd bit of UINT64 m_lastRead. If we assume perfect rollover detection, then the upper word of m_lastRead is incremented every 0.0036992 seconds (~270 times per second) and the HAL_Completion_IdleValue corresponds to 0xFFFF ticks / 270 ticks per second = 242.42 seconds = 4:02 minutes. Bingo.
When HAL_COMPLETION::DequeueAndExec() gets to the end of the queue, it calls HAL_Time_SetCompareCompletion(HAL_Completion_IdleValue) which is not implemented for Krait yet.
Lots of drivers call HAL_COMPLETION::Abort(), including CLR_RT_Thread::Execute(...) --> Events_SetBoolTimer(...). But HAL_COMPLETION::Abort() on an empty completion list will pass the HAL_Completion_IdleValue to HAL_Time_SetCompare(), setting the comparison time soon in the future or in the past. If in the past, Krait_TIME_Driver::SetCompareValue(...) sets fForceInterrupt = true and calls Krait_TIMER_Driver::SetCompare at a very close time in the future.

So the solution is to change register APCS_DGT_EN to 0x1 and test. There may still be other problems too (like how HAL_TIME_SetCompareCompletion is stubbed.)

MichaelAtSamraksh commented 10 years ago

See commit 10f513b56635fee8e9950fc3dcb2a80735f347ed It looks like the fix was added here but commented out before committing.