NETMF / netmf-interpreter

.NET Micro Framework Interpreter
http://netmf.github.io/netmf-interpreter/
Other
487 stars 224 forks source link

Question about Multithreading #506

Open Michaelb-NGC opened 8 years ago

Michaelb-NGC commented 8 years ago

This was a post that I did at GHI,, but was referred to this forum as it tends to dig in deeper than the hobbiest does,, but not as deep as a professional programmer. I am slowly getting a better understanding of how all of this works, and appreciate any help. Thanks.

It was my understanding, that the Thread Scheduler was ran at the CRL level... But yet,, when digging into the 1-Wire protocols at the PAL Layer,, I run into this C++

//timing critical, so I'll disable interrupts here GLOBAL_LOCK(irq); //EA = 0;

Then in reading more,, I find:

As of v4.4 the updated port of the lwIP network stack requires a minimal pre-emptive thread scheduler
  1. So,,, my questions are: Does the CRL thread scheduler use irq's to decide when to switch threads?
  2. If there is a long enough Delay added in the PAL layer, will it allow a separate thread to run while its paused?
  3. If a PAL Routine is started,, but its execution takes longer than the thread time that it is called from,, will it swap threads or wait till the execution is finished?
  4. Same question as #3,, but the HAL Layer,,, ie C++

BOOL __section("SectionForFlashOperations")STM32_Flash_Driver::EraseBlock( void* context, ByteAddress Sector )

as it seems that erasing / programming flash would be very critical?

Thanks

smaillet-ms commented 8 years ago

Hopefully I can provide answers:

  1. Yes - Sortof 😉 The CLR uses the HAL completions/continuation mechanism which are ordinarily driven by an interrupt on a micro controller. However there is no direct requirement for that as NETMF is designed to run on a variety of platforms (including Windows user mode code where you don't have access to interrupts [The emulator is actually an instance of the NETMF interpreter running as a Windows application]) Normally, in a microcontroller port an interrupt from a timer triggers a callback that sets a flag in an event mask. The only requirement the CLR has is that something in the HAL/PAL sets that flag on a periodic basis. When the CLR completes interpreting an instruction and is about to fetch the next one it checks for the reschedule event flag and if needed swaps out the active thread for a new one.
  2. Code that runs in the HAL or PAL is generally single threaded. With v4.4 and the network stack that generalization isn't actually true. However, the CLR interpreter is always a single thread. ALL managed code execution is suspended while in the PAL/HAL. The multi threaded support used in v4.4 in the HAL is to better handle the asynchronous nature of a network stack. The interpreter itself operates on one native thread and is generally oblivious to the presence of the RTOS.
  3. Native code owns the CPU as far as the CLR is concerned. Managed code threads are not tied to an actual native thread in any way. They are handled entirely by the CLR as it interprets the instructions. The event flag set from the HAL indicates the CLR should schedule a new thread and the process of switching the thread is a simple change in the instruction pointer and stack frame (Since the .NET Virtual machine has no registers there isn't much in the way of context saving required) Thus if a PAL routine sits in a While(true) loop it will usually block the CLR from running. The exception is if an RTOS or other multi-threaded OS is used underneath AND the code in question is running on a native thread different from the CLR interpreter. In that case it wouldn't block the CLR thread from running.
  4. Same as #3 There's no distinction in the threading between HAL and PAL layers.
josesimoes commented 8 years ago

Considering the known lack of documentation these are words of wisdom that might get lost in the issue threads. Steve (@smaillet-ms) can I suggest that you do a simple copy/paste of this question and your explanations into a new Wiki entry?

cw2 commented 8 years ago

+1

Also, there is a special mechanism involving CLR_RT_StackFrame::m_customState that is worth some explanation too... (IIRC it is some kind of re-entrancy, used in I2C and serial drivers, among other occurrences).

Michaelb-NGC commented 8 years ago

@smaillet-ms Wow,, thanks,, this has been the most (in depth help,, not trying to put aside those who have helped before) since getting involved in Netmf.

Before I ask my next question,, I do have a small soap box to stand on,, and say this is definitely like breaking into an OLD TIMERS CLUB. Just to get a basic understanding of what's going on, is frustrating,, and needing to go back many versions,,, then catch back up... some of the questions I'm asking,,, aren't just for me,, but to create a trail of bread crumbs of people following me. I loved Joesimoes idea of creating a new wiki,, as I'd have no problem continuing to contribute questions,, (and not just ones for the fact of questions,, but document someone's learning, so others can follow,,, or I can follow others). And the more I learn. That aside, the more this benefits corporate,, and as I've echoed before,, if corporate is using it,, then why not be able to contribute money to it.... I'm not expecting to get something for free,, but errr... Thats my soap box...

So,, back to multithreading,,, if these two commands are called,, {_rawDevice.RawBus.WriteByte(Command.MatchRom); WriteBytes(_rawDevice.Address); (Where statistically this lock gets called the most) GLOBAL_LOCK(irq); //EA = 0; } gets called,, can a separate HW interrupt already be called from the serial handler,, so you end up with a race condition where,, all IRQ's are disabled,,, and everything hangs? I posted a separate 1-wire question on GHI,, but got intrigued with what was happening with the first post back,, as it helps me understand how to code for it....

smaillet-ms commented 8 years ago

@Michaelb-NGC I hear ya on the soap box. The history of NETMF is such that it was originally believed that only a small number of entities would be porting the framework versus the number using an existing port on available hardware. So the porting process never received the same level of documentation as the application development did. That's a major reason why I've focused on starting the vNext branch with documentation of the internals.

I'm not sure I fully follow your example.

GitHub note: When including code you can mark it as code using markdown with triple tick characters - the special single back quote character under the ~ on a standard layout keyboard or you can use the button "<>" on the GitHub page in the editor region to insert code so it formats like real code.

GLOBAL_LOCK() isn't necessarily interrupts off, GLOBAL_LOCK() is probably best thought of as an acquire/release pair for a static global mutex protecting the internal state of the interpreter, including event flags. It is a macro whose definition is controlled by the platforms settings header file. Normally it uses the grossly misnamed SmartPtr_IRQ class which is defined for a given platform to acquire the lock and release it in the destructor (Classic C++ RAII pattern). Typically the platform_selector.h has something along these lines in it:

#define GLOBAL_LOCK(x)             SmartPtr_IRQ x
#define DISABLE_INTERRUPTS()       SmartPtr_IRQ::ForceDisabled()
#define ENABLE_INTERRUPTS()        SmartPtr_IRQ::ForceEnabled()
#define INTERRUPTS_ENABLED_STATE() SmartPtr_IRQ::GetState()
#define GLOBAL_LOCK_SOCKETS(x)     SmartPtr_IRQ x

#if defined(_DEBUG)
#define ASSERT_IRQ_MUST_BE_OFF()   ASSERT(!SmartPtr_IRQ::GetState())
#define ASSERT_IRQ_MUST_BE_ON()    ASSERT(SmartPtr_IRQ::GetState())
#else
#define ASSERT_IRQ_MUST_BE_OFF()
#define ASSERT_IRQ_MUST_BE_ON()
#endif

I say SmartPtr_IRQ is misnamed as it isn't particularly smart, nor a pointer and only historically has anything to do with IRQs. While it is common for micro controllers to use interrupts off for this, doing so when an RTOS in place is a tad crude. Disabling interrupts is a big huge double sided axe typically swung by an Orc. With modern processors and runtimes it is rarely ever needed and better options almost always exist. Disabling interrupts also messes with and complicates Real-Time guarantees of the system.

In fact if you look at the implementation of SmartPtr_IRQ for the 'Windows' solution you will find it is based on a Win32 CRITICAL_SECTION.

If, as in most of the micro controller ports, SmartPtr_IRQ is implemented as interrupts off, then nothing else will run until the lock is released in the destructor at scope exit.

The general idea for GLOBAL_LOCK() in NETMF was to prevent interrupt handlers from getting in the middle of a read+modify+write back sequence of critical internal interpreter data structures. Especially the events flags which is just a 32bit value with each bit representing the state of one of the event notifications. In many cases interrupt handlers ultimately lead to notification of events to the interpreter and the interpreter has to clear bits after handling them. Thus a way to prevent interrupting the multi-instruction update cycle is needed. (Ideally this is some sort of InterlockedCompareExchange operation but the older microcontroller instruction sets didn't support such operations without disabling interrupts.) Unfortunately there is still a lot of legacy HAL code that has abused the GLOBAL_LOCK() mechanism and ends up disabling interrupts for rather long periods of time. That's all code ripe for cleanup.

In vNext we need to document the mechanisms patterns and semantics for these kinds of things better as well as be more prescriptive about what should and shouldn't be done.

smaillet-ms commented 8 years ago

@cw2 Unfortunately I've hit a page-fault in my memory on the custom state and the data can't be fetched from cache or local disk. I'll need to resort to recovery from a tape back up. 😉

Translation: I have little recollection of the details on that and I'd need to look into it before I could provide any more useful information.

Michaelb-NGC commented 8 years ago

@smaillet-ms Unfortunately I've hit a page-fault in my memory on the custom state and the data can't be fetched from cache or local disk. I'll need to resort to recovery from a tape back up. :wink:

Thanks for the quick and informative reply,, & I'm officially giving notice,,, I'm going to steal your quote for future use...