RTOS2 : question about critical region

gerhil commented 7 years ago

Why isn't it allowed to use osThreadNew(), osThreadSuspend() and osThreadResume() in critical regions? See rtx_thread.c lines 1505, 1589 and 1599.

0xc0170 commented 7 years ago

@Wazalogic have you tried it to run with without these conditions? As I recall it hard-faults, as it tries to make svc calls.

JonatanAntoni commented 7 years ago

In case of RTX5 we use SVC calls to implement this management functions, yes. Thus interrupts must not be disabled. Can you explain your use case for placing osThreadXXX calls in critical regions, i.e. having interrupts globally disabled?

ilg-ul commented 7 years ago

where are documented the critical sections in CMSIS OS2? I asked for it many times in the past, but I don't remember receiving an answer.

generally speaking, the implementation of critical sections does not necessarily imply having interrupts globally disabled; on v7m devices you can set the PRIMASK.

any RTOS must have clear rules on how to enter/exit interrupts critical sections, and the mechanism must allow unlimited nesting; similar to scheduler critical sections, the safest implementation is to read the current status and save it to a variable, disable interrupts or change PRIMASK, do the critical code, and restore the saved status.

CMSIS++ has clearly defined functions for this:

http://micro-os-plus.github.io/reference/cmsis-plus/classos_1_1rtos_1_1interrupts_1_1critical__section.html#adbacb147546e7166311b22c960f41e6a

and for the C API:

http://micro-os-plus.github.io/reference/cmsis-plus/group__cmsis-plus-rtos-c-core.html#gaeb1888b2cebfa7d7beb7b1590633033a

gerhil commented 7 years ago

In my project I need to perform context switching between specific threads as follows

core_util_critical_section_enter(); osThreadResume(newThread); osThreadSuspend(otherThread); core_util_critical_section_exit();

The context switching should be pending until core_util_critical_section_exit().

The interrupt serivce routine can wake up a thread, using osThreadResume(thread), and context switching should not happen if the program is in a critical region.

gerhil commented 7 years ago

any RTOS must have clear rules on how to enter/exit interrupts critical sections, and the mechanism must allow unlimited nesting; similar to scheduler critical sections, the safest implementation is to read the current status and save it to a variable, disable interrupts or change PRIMASK, do the critical code, and restore the saved status.

I agree. FreeRTOS also supports this behaviour.

gerhil commented 7 years ago

I understand that this is limited by the SVC call implementation. However, from a design perspective this limitation is a design flaw. The semantics of the API functions should allow these functions in a critical region. Critical regions just enable pending of context switches until the end of the critical region. For example, FreeRTOS implements this semantics and it gives good control of threads. This is what multithreading is about! The fact that critical regions, osThreadSuspend and osThreadResume are very dangereous to use, should be no reason to exclude or limit them.

ilg-ul commented 7 years ago

FreeRTOS also supports this behaviour.

yeah, just that the implementation is suboptimal, it uses a counter, which does not play well at all if interrupts were disabled outside their RTOS (exit() will see the counter to be zero, and prematurely enable the interrupts).

as I said, the only absolutely safe implementation is to save the status on enter() and restore it on exit() (as CMSIS++ does).

ilg-ul commented 7 years ago

Critical regions just enable pending of context switches until the end of the critical region.

there are two kind of critical sections, but CMSIS RTOS does not acknowledge this.

there are scheduler critical sections, which prevent context switches, and interrupt critical sections, which prevent interrupts with lower priority.

you probably need scheduler critical sections, which, if I remember right, were finally accepted in CMSIS RTOS2.

gerhil commented 7 years ago

Yes. I can probably use any of them because I keep the critical sections very very small.

gerhil commented 7 years ago

@0xc0170 I have disabled the critical section and the code runs ok. However, I need the critical section to prevent race conditions.

ilg-ul commented 7 years ago

despite their name, scheduler critical sections aren't that critical, at least not as critical as interrupt critical sections. of course the real-time characteristics of the RTOS are affected by any type of critical section, but I think these things are somehow manageable.

ReinhardKeil commented 7 years ago

Hi gerhil,

can you provide a bit more information about this code section. What is behind the functions: core_util_critical_section_enter() and core_util_critical_section_exit(). Can you provide the source code of that functions?

Why is it important to resume context switch when you are in a critical section. I would expect that the opposite is important? I would assume that you do not want any context switches in a critical section.

Why does such a sequence not work?

osThreadSuspend(myThread);
core_util_critical_section_enter();
   // execute stuff that should not get interrupted by thread switches
core_util_critical_section_exit();
osThreadResume(myThread);

ilg-ul commented 7 years ago

What is behind the functions: core_util_critical_section_enter() and core_util_critical_section_exit(). Can you provide the source code of that functions?

this is a good example why the code for interrupt critical sections should be in the RTOS public API, because applications need it, and not having it in the API leads to re-implementations in the application, most of the time inconsistent with the system.

gerhil commented 7 years ago

Hi Reinhard, I came from mbed. I used their code base (feature_cmsis5) which is derived from ARM’S cmsis5. They couldn’t help me and therefore they sent me here.

The functions core_util_critical_section_enter() and core_util_critical_section_exit() come from mbed_critical.c. Sorry that I didn’t mentioned it. These functions use __disable_irq() and __enable_irq() from rt_HAL_CM.h. I have tried these functions as well and I got same problem.

The critical region should disable context switching. This is what I would expect. In my case, the osThreadResume() and osThreadSuspend() must be performed in sequence without context switching in between. Also, the osThreadResume() and osThreadSuspend() must not yield in a critical region. If this is not the case then a context switch after osThreadResume() and before osThreadSuspend() could allow another thread (e.g. the one being resumed) to resume this thread that has not been suspended yet. This is a race condition causing deadlock. FYI, the sequence ‘osThreadSuspend();osThreadResume()’ is also a possible solution, but this may perform unnecessary overhead of going through the idle state. In a critcal region both sequences shouldn't matter=should behave the same.

I looked at osThreadGetPriority(). It seems to me that you cannot call osThreadGetPriority() in a critical region. Why not? Isn’t that odd? I am missing sound semantics in the API of cmsis5.

Gerald.

ReinhardKeil commented 7 years ago

Thanks for the clarification - that was important information for us.

For disabling thread switching, you may use osKernelLock / os KernelUnlock. See also http://arm-software.github.io/CMSIS_5/RTOS2/html/group__CMSIS__RTOS__KernelCtrl.html#ga948609ee930d9b38336b9e1c2a4dfe12

This does not alter the interrupt system and should be sufficient the for majority of use cases.

gerhil commented 7 years ago

Thanks for pointing out to osKernelLock/osKernelUnlock. However, this does not work with osThreadResume() and osThreadSuspend() as expected.

Is it better to use osEventFlagsSet() and osEventFlagsWait() instead of resp. osThreadResume() and osThreadSuspend()? How does this behave within osKernelLock/osKernelUnlock?

ReinhardKeil commented 7 years ago

Here is a bit more background:

As you perhaps know, RTX5 is designed to be extended with MPU protection. As such RTX functions run in handler mode (called via SVC). In an MPU protected system, most threads should run in unprivileged mode and directly enabling/disabling IRQ is impossible. We are in the process get a safety certification for RTX5 and therefore we are reluctant to add RTOS function that enable/disable all IRQ as this introduces also safety risk. Incorrect handling may result in a faulty system since interrupt response times become unpredictable in other parts of the application.

Using osKernelLock and osKernelUnlock allows to disable thread switching and is usually what is needed. But I'm open to get more information of real use cases.

While IRQs are disabled, the SVC calls to RTOS are also blocked. For important RTOS functions that should be callable from IRQ functions, a special handling is implemented, however this adds overhead. We believe that it should be not required to call osThreadGetPriority from IRQ, but again I'm open for feedback to hear about potential use cases.

In general, a good design approach should minimize the code that is executed in the IRQ handler itself and the RTOS helps you to do that. Therefore disabling all interrupts in a system (except in the system startup phase) should be never required.

gerhil commented 7 years ago

Thanks again. I see, I hit the protection mechanism. I agree, we should be very careful with functions like __disable_irq() and ISRs must be short. How can I perform my code (context switching) in unprivileged mode or can I switch to privileged mode?

Consider the following example. Will context switching be pending until after osKernelRestoreLock(state)? If so, then this could solve my problem.

uint32_t state = osKernelLock();
osEventFlagsSet(…);
osEventFlagsWait(…);
osKernelRestoreLock(state);

BTW: I am not using an osThreadGetPriority() in an ISR. I wonder if this protection is necessary for all functions.

JonatanAntoni commented 7 years ago

Hi @gerhil, I admit the documentation is osThreadResume() and osThreadSuspend() is not as the best. What did you expect? And what did you get? I made a small Blinky example with two threads and tried to alter between them using

osKernelLock();
osThreadResume(tid_threadB);
osThreadSuspend(tid_threadA);
osKernelUnlock();

and this works as I expect. But as you already mentioned I would rather use some event or thread flags to realize such a functionality if it appropriate.

gerhil commented 7 years ago

I am doing the same, but there seems to be a glitch that performs a yield inside the critical region. My code is too big to post here.

I try to make an example....

ilg-ul commented 7 years ago

there seems to be a glitch that performs a yield inside the critical region

what glitch?

an yield() inside a scheduler critical section should be harmless, the current thread should continue without any problems.

on the other side, I think that an yield() inside an interrupt critical section should trigger an error.

gerhil commented 7 years ago

@ilg-ul I mean yield performed by osThreadResume() and osThreadSuspend() inside the critical section.

Example will follow...

gerhil commented 7 years ago

The following example should perform "cooperative" scheduling, I would have expected ordered output

Task1
Task2
Task1
Task2

etc. I get disordered results. Preempting results.

Example:

osThreadId_t tid_thread1;
osThreadId_t tid_thread2;

void task1(void *parameter) {
    tid_thread1 = osThreadGetId();
    osThreadSuspend(tid_thread1);
    while(1) {
        printf ("Task1\r\n");
        fflush(stdout);
        osKernelLock();
        osThreadResume(tid_thread2);
        osThreadSuspend(tid_thread1);
        osKernelUnlock();
    }
}

void task2(void *parameter) {
    tid_thread2 = osThreadGetId();
    osThreadSuspend(tid_thread2);
    while(1) {
        printf ("Task2\r\n");
        fflush(stdout);
        osKernelLock();
        osThreadResume(tid_thread1);
        osThreadSuspend(tid_thread2);
        osKernelUnlock();
    }
}

int main() {
    os_thread_t cb_thread1;
    memset(&cb_thread1, 0, sizeof(cb_thread1));
    osThreadAttr_t thread_attr1;
    memset(&thread_attr1, 0, sizeof(thread_attr1));
    thread_attr1.name = "thread1";
    thread_attr1.stack_size = 72U + (256U*4U);
    thread_attr1.stack_mem = new uint8_t[thread_attr1.stack_size];
    thread_attr1.priority = osPriorityNormal;
    thread_attr1.cb_mem = &cb_thread1;
    thread_attr1.cb_size = sizeof(os_thread_t);

    os_thread_t cb_thread2;
    memset(&cb_thread2, 0, sizeof(cb_thread2));
    osThreadAttr_t thread_attr2;
    memset(&thread_attr2, 0, sizeof(thread_attr2));
    thread_attr2.name = "thread2";
    thread_attr2.stack_size = 72U + (256U*4U);
    thread_attr2.stack_mem = new uint8_t[thread_attr2.stack_size];
    thread_attr2.priority = osPriorityNormal;
    thread_attr2.cb_mem = &cb_thread2;
    thread_attr2.cb_size = sizeof(os_thread_t);

    tid_thread1 = osThreadNew((osThreadFunc_t)task1, NULL, &thread_attr1);
    if (tid_thread1 == NULL) {
        printf("Thread1 not created!\r\n");
    }

    tid_thread2 = osThreadNew((osThreadFunc_t)task2, NULL, &thread_attr2);
    if (tid_thread2 == NULL) {
        printf("Thread2 not created!\r\n");
    }
    osDelay(1000); // Make sure both tasks are executed and suspended.
    osThreadResume(tid_thread1); // Start Task1 first.

for(;;) { osDelay(1000); }
return 0;
}

JonatanAntoni commented 7 years ago

@gerhil Yes, i think your example reveals the "problem". You suspend the active thread while the scheduler is locked. Suspending a thread puts this thread to be blocked immediately. Thus osThreadSuspend(self) is not expected to return until one issues osThreadResume() for this thread.

My humble opinion is you should not rely on such a design. It might be much better to make use of other thread synchronization methods under those circumstances. Did you try to solve this using thread or event flags, yet?

gerhil commented 7 years ago

This shows that osThreadSuspend() and osThreadResume() are incomplete. I wonder if event flags will be useful this way. What will osEventFlagsWait(…) do in a critical section? Same as osThreadSuspend()?

I agree. This is not a design I will ever be using in an application. I am using this technique for a new type of real-time kernel on top of cmsis.

Have you heard of “Communicating Sequential Processes” (CSP) by Tony Hoare and Bill Roscoe? It’s a theory of programming that provides true concurrency. It’s more than a process algebra. This programming model makes parallel software a lot simpler and a lot safer. Concurrency is not multithreading and it is not parallelism. Also, it does not mean “same time”. Concurrency is different. I am developing a new programming model based on CSP for C++ (and other modern programming languages). I want to make this programming model available for the ARM processor. Currently, it works with FreeRTOS for ARM, but I want to make it useful with cmsis-rtos. To make this work with cmsis, I need this type of context switching to work.

ilg-ul commented 7 years ago

This shows that osThreadSuspend() and osThreadResume() are incomplete

what makes them incomplete? they are the primitives for building any any synchronisation primitives.

I did not check the new CMSIS RTOS2 pages, perhaps they do not explain what these two functions are intended; suspend() is intended (or at least was intended, when I suggested them) to remove the current thread from the running list, and resume() was intended to add the given thread to the running list.

so, what makes them incomplete?

CSP

ok, this is one of the new programming models. I'm not sure it can be implemented on top of more classical APIs.

gerhil commented 7 years ago

what makes them incomplete? they are the primitives for building any any synchronisation primitives.

I mean by incomplete, their semantics in the presence of a critical section is incomplete. This discussion is also found at FreeRTOS. Pending context switching until the end of a critical section is a good solution and logical to the semantics of a critical section.

I believe that suspend and resume are thread control functions and not synchronization primitives. Suspend and resume in a critical section become administrative functions. They should not yield. Semaphores and monitors are synchronization primitives.

ok, this is one of the new programming models. I'm not sure it can be implemented on top of more classical APIs.

I build CSP with Dijkstra’s semaphores and Hoare’s monitors. I only need a critical section and suspend and resume from the cmsis kernel for context switching. The programming model is called Waza. www.wazalogic.com

ilg-ul commented 7 years ago

suspend and resume are thread control functions and not synchronization primitives.

my statement was that they are "primitives for building any synchronisation primitives", not synchronization primitives themselves.

their semantics in the presence of a critical section is incomplete

the semantics of scheduler critical sections is very simple and clear: 'no context switches are performed inside the critical section'; if you call resume(), the thread will be added to the ready list, but, being inside the scheduler critical section, the context switch is ignored for now. however the thread was enqueued in the ready list and remains there. if you call suspend(), normally the current thread is suspended and execution continues to a thread chosen by the scheduler, but, in a scheduler critical section, the current thread is resumed immediately (suspending another thread than the current thread should not be allowed). I thought that this behaviour is obvious and expected. what would you add to make it "complete"?

I don't know the details of the RTX implementation, but this was the original expected behaviour, and I still think it is minimal and orthogonal.

BTW, in your example the sequence [ lock(), resume(), suspend(), unlock() ] makes not much sense; according to the above semantics, a single resume() would be enough.

gerhil commented 7 years ago

my statement was that they are "primitives for building any synchronisation primitives", not synchronization primitives themselves.

Ah yes. Sorry you are right.

the semantics of scheduler critical sections is very simple and clear: 'no context switches are performed ...

I would also except the behavior you describe. My example shows a different behavior.

BTW, in your example the sequence [ lock(), resume(), suspend(), unlock() ] makes not much sense; according to the above semantics, a single resume() would be enough.

Only one thread should perform at a time. The example resumes another thread and suspends the current thread. All threads have equal priorities, Without the suspend the semantics is open for non-deterministic behavior.

ilg-ul commented 7 years ago

My example shows a different behavior.

then it should be a misunderstanding in the RTX implementation.

The example resumes another thread and suspends the current thread.

both these actions are performed by resume(). just try to replace in your example the scheduler critical sections with a single resume().

FYI, I did extensive studies and testing with various solutions while designing and implementing µOS++ / CMSIS++. actually the synchronisation objects in µOS++ are highly portable and can run on top of another scheduler, with the layered design of CMSIS++ using only these two suspend()/resume() functions (I have a functional version on top of the FreeRTOS scheduler).

gerhil commented 7 years ago

then it should be a misunderstanding in the RTX implementation.

Yes, that is the problem here.

both these actions are performed by resume(). just try to replace in your example the scheduler critical sections with a single resume().

Does this work with a preemptive scheduler underneath?

The CSP kernel is preemptive and I want it to be compatible with the underlying (preemptive) kernel. BTW, CSP is busy-waiting free.

Ok, I have tested your suggestion and the output seems to be time-sliced or printf is interfering. I do not get the output I would expect.

ilg-ul commented 7 years ago

or printf is interfering

btw, please be aware that newlib itself is not thread aware. using stdio functions in multi-threaded apps is a gamble.

gerhil commented 7 years ago

Why is this thread DONE?

ilg-ul commented 7 years ago

that's ARM way of telling you to not expect anything more from them.

gerhil commented 7 years ago

I am surprised and disappointed. I have to tell my customers not to used rtx2/cmsis5/mbed.

ilg-ul commented 7 years ago

I've already been there. when RTOS1 was out, I was also optimistic and recommended it to my customers. I even had a RTOS1 API for my µOS++. but it was hopeless, the RTOS1 API triggered many protests and I had to abandon it.

RTOS2 is slightly better, but I still can't recommend it... :-(

(not to mention that in the mean time, given the choice, my customers discovered that the C API is somehow redundant and the µOS++ native C++ API is used now quite successfully).

ReinhardKeil commented 7 years ago

Gerald

I believe you did accidentally set the thread to DONE.

I was actually surprised by that as well. Please reopen it.

Reinhard

From: Gerald Hilderink notifications@github.com Sent: Monday, April 3, 2017 3:58:34 PM To: ARM-software/CMSIS_5 Cc: Reinhard Keil; Comment Subject: Re: [ARM-software/CMSIS_5] RTOS2 : question about critical region (#171)

Closed #171https://github.com/ARM-software/CMSIS_5/issues/171.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ARM-software/CMSIS_5/issues/171#event-1026522647, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH4pGkF3JGxdW-ZLXgmy78D8nAHJKc6jks5rsPsKgaJpZM4MrYSB.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ReinhardKeil commented 7 years ago

Correction: you perhaps accidentally CLOSED the ticket.

From: Gerald Hilderink notifications@github.com Sent: Monday, April 3, 2017 9:34:03 PM To: ARM-software/CMSIS_5 Cc: Reinhard Keil; Comment Subject: Re: [ARM-software/CMSIS_5] RTOS2 : question about critical region (#171)

I am surprised and disappointed. I have to tell my customers not to used rtx2/cmsis5.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ARM-software/CMSIS_5/issues/171#issuecomment-291249492, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH4pGrFZ2W6PCeCPyfLEGSiXmATk0airks5rsUmrgaJpZM4MrYSB.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

gerhil commented 7 years ago

Reinhard

JonatanAntoni did set the thread to DONE.

JonatanAntoni added DONE and removed review labels 6 hours ago

I found the button Reopen 😊

RobertRostohar commented 7 years ago

@gerhil

the semantics of scheduler critical sections is very simple and clear: 'no context switches are performed ...

I would also except the behavior you describe. My example shows a different behavior.

I have tried your example and it works as expected. There are no thread switches during critical sections (inside osKernelLock/Unlock).

JonatanAntoni commented 7 years ago

@gerhil I am sorry for assuming the issue is DONE because you closed it by accident.

Please feel free to (reopen and) continue commenting on issues you still have concerns about. Take also note of the label descriptions we added recently to the README.

DONE does not mean we will ignore any further comments.

gerhil commented 7 years ago

@JonatanAntoni I read in the forum said that you added DONE and removed review status.

Thanks Jonatan.

@RobertRostohar Hi Robert, thank you for testing. I use mbed and perhaps there is a problem there. I will look into it.

I use mbed to create projects. I am newbie with ARM tooling. How can I create a project with this package (without mbed)?

JonatanAntoni commented 7 years ago

@gerhil this depends on the toolchain and ide you are using. The basic idea of CMSIS software packs is described in our CMSIS documentation. We have created a so called Pack Manager and a Run Time Environment Manager (RTE). This is part of our MDK Development Tools, and provided as a open-source plug-in for Eclipse.

gerhil commented 7 years ago

Can the plug in be used with MCUXpresso? I use MCUXpresso and mbed at the moment.

JonatanAntoni commented 7 years ago

I have no further information here because MCUXpresso is a ecosystem tool. Refer to the MCUXpresso community as well, please.

gerhil commented 7 years ago

How can I install the plugin? The readme says "Download the CMSISPackPlugin_x.x.zip from corresponding release", but where can I find the file?

ilg-ul commented 7 years ago

Can the plug in be used with XXX?

the general answer is 'yes, as long the XXX tools integrate the ARM packs plug-ins', otherwise expect things to be difficult (this is similar to the classical 'can I have my Ford in any colour? yes, as long as it is black').

I did not check the latest version, but all my previous attempts to use the plug-ins failed, for various reasons, and I finally gave up.

of course, if you have better experiences, or if you find the plug-ins integrated in other tools, please let me know, to re-evaluate.

gerhil commented 7 years ago

@RobertRostohar I have updated mbed but still the same output. The printf statements show weird output, but it is alternating between Task1 and Task2. So, I have surrounded the printf statements with osKernelLock() and osKernelUnlock(). I get good printf output but the example is not alternating anymore. The output is:

Task1
Task1
Task1
Task1

etc. Any idea what the problem is?

RobertRostohar commented 7 years ago

@gerhil Let me clarify my previous statements. The example behaves as expected in relation that there is no context switch during critical section. However it doesn't work in a way that you have desired.

I fully agree with following statements from this thread.

the semantics of scheduler critical sections is very simple and clear: 'no context switches are performed inside the critical section

if you call resume(), the thread will be added to the ready list, but, being inside the scheduler critical section, the context switch is ignored for now.

if you call suspend(), normally the current thread is suspended and execution continues to a thread chosen by the scheduler,

but, in a scheduler critical section, the current thread is resumed immediately

void task1(void *parameter) {
    tid_thread1 = osThreadGetId();
    osThreadSuspend(tid_thread1);
    while(1) {
        printf ("Task1\r\n");
        fflush(stdout);
        osKernelLock();
        osThreadResume(tid_thread2);
        osThreadSuspend(tid_thread1);
        osKernelUnlock();
    }
}

osKernelLock() enters critical section
osThreadResume(tid_thread2) puts thread2 into ready list however does not switch to it since scheduler is locked
osThreadSuspend(tid_thread1) would normally suspend running thread1 however since scheduler is locked it returns immediately without context switch
osKernelUnock() exits critical section
since both running thread1 and ready thread2 have the same priority execution continues with thread1 and repeats the above steps until round robin switches to thread2 (outside critical section)

The described behaviour is expected and explains also the disordered printf outputs.

ilg-ul commented 7 years ago

... I fully agree with following statements ...

ok, great.

but what do you think about the following use case:

void task1(void *parameter) {
    tid_thread1 = osThreadGetId();
    osThreadSuspend(tid_thread1);
    while(1) {
        osKernelLock();
        printf ("Task1\r\n");
        fflush(stdout);
        osKernelUnlock();

        osThreadResume(tid_thread2);
     }
}

except possible printf() oddities, in my opinion it should work. any idea why it did not in @gerhil's test?

ARM-software / CMSIS_5

RTOS2 : question about critical region #171