apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.74k stars 1.14k forks source link

Bug: sporadic scheduling does not work for multiple threads #2935

Open JanStaschulat opened 3 years ago

JanStaschulat commented 3 years ago

Hi,

I am using NuttX for micro-ROS on STM32 microcontroller on Olimex board. Link to example

which is a very simple extension of the NuttX example for sporadic scheduling in testing/ostest

Test setup:

Observation:

Problem description I came to the conclusion, that budget enforcement of the NuttX sporadic scheduling only works for one sporadic thread. For real applications, I would like to use multiple threads with sporadic scheduling.

Could you please check the implementation and give support?

patacongo commented 3 years ago

It think that the sporadic scheduler is overly complex and I had always planned to redesign that scheduler. If someone is interested in that redesign, I would be happy to share my thoughts.

patacongo commented 3 years ago
* When I configure the application with one sporadic thread and one FIFO thread (with lower priority and 100% CPU utilization), budget enforcement works well.

I think in this case, to observe the full range of behaviors, you would need a 3 threads. The two that you are using now:

And also:

Without this high priority thread, it might be the case the sporadic scheduler is just broken and has nothing to do with two sporadic threads.

When I tested this many years ago, I added GPIO outputs from the OS task scheduler hooks. Then I could see the task behavior on a logic analyzer. I don't think anyone has used the sporadic scheduler since then and the case with it could suffer from bit rot or is not properly verified.

patacongo commented 3 years ago

There are two known issues with the sporadic scheduler in the top-level TODO list:

patacongo commented 3 years ago

You don't mention what the failure behavior is. You say "budget enforcement does not work". It might be helpful to know what you mean by that.

JanStaschulat commented 3 years ago

Thanks for your quick feedback. I ran a couple of experiments with this test program

Some links to the source code:

The results show, that NuttX does not schedule two sporadic threads according to the specified budgets:

Experimental Results: 
Setup:
config:
thread 1: 
- SCHED_SPORADIC, 
- prio high 180, prio low 20,
- budget 10ms, period 100ms
- max replenishments = 100

thread 2: 
- FIFO-thread, 
- prio: 120

thread 3: 
- SCHED_SPORADIC, 
- prio high 179, prio low 19,
- budget 30ms, period 100ms
- max replenishments = 100
- 
Hardware: 
- Olimex board (STM32), 
- NuttX OS

Experiment:
- callback function in each thread has a busy_loop of 1ms and increments a counter
- experiment runs for 10 seconds
- at the end the counter values of all threads are reported, e.g. the number of milliseconds
  the thread could execute in interval of 10 seconds (total 10000 milliseconds)

Exp 1: (one sporadic thread and FIFO thread)
configuration with 
- thread 1:  sporadic thread with budget = x ms and period=100ms
- thread 2: low-prio FIFO thread

config      result      result
sporadic 1  sporadic 1  fifo
budget(ms)  (ms)        (ms) 
---------------------------------
0            96         9815
10         1074         8837
20         2043         7868
30         3014         6896
40         3985         5925
50         4956         4953
60         5920         3990
70         6899         3010
80         7870         2039
90         8804         1105
100        9910            0

Exp 2 (two sporadic threads and FIFO thread)
Keep sporadic thread 2 with 30/100ms budget/period, vary budget of thread 1 from 0 - 100ms
configuration with 
- thread 1:  sporadic thread with budget = x ms and  period=100ms, prio see above
- thread 3:  sporadic thread with budget = 30 ms and period=100ms, prio see above
- thread 2: low-prio FIFO thread, prio see above

config      result      result      result
sporadic 1  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
------------------------------------------
0            145         981       8784
10          1073         971       7864
20          2044          10       7854
30          3013           0       6895
40          9909           0          0
50          9909           0          0
60          9909           0          0
70          9909           0          0       
80          9908           0          0
90          9908           0          0
100         9909           0          0

Exp 3 (two sporadic threads and FIFO thread)
Keep sporadic thread 1 with 30/100ms budget, vary budget of thread 2 from 0 - 100ms
configuration with 
- thread 1:  sporadic thread with budget = 30 ms and period=100ms, prio see above
- thread 3:  sporadic thread with budget = x ms and  period=100ms, prio see above
- thread 2: low-prio FIFO thread, prio see above

config      result      result      result
sporadic 2  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
----------------------------------------
0           5246        4661            0    
10          7091        2816            0         
20          9132        776             0
30          3015           0         6892
40          3016           9         6883
50          3015          49         6844
60          3016        2311         4581
70          3065        2484         4359    
80          3015          48         6845
90          3015          91         6802 
100         3053        4726         2128 

Exp 4 (two sporadic threads and FIFO thread)
Same as Experiment 1, but both sporadic threads with the same priority settings
- sporadic 1: high prio 180, low prio 20
- sporadic 2: high prio 180, low prio 20
- fifo      : prio 120

config      result      result      result
sporadic 1  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
------------------------------------------
0          144            981         8784
10        1073            971         7864
20        2044             10         7854
30        3015              0         6892
40          39           3044         6825   
50        4957           4950            0
60        4427           1559         3923
70        6880           3028            0
80        7840           2066            0
90        8802           1105            0
100       9861             47            0

Example raw output:
Sporadic thread 1: prio high: 180, low: 20, budget: 10000000
pthread_create: budget 0 s 10000000 ns ticks: 10 , period 0 s 100000000 ns ticks 100 
thread id 8
sporadic thread 2: at prio high 179 low: 19, budget: 30000000
pthread_create: budget 0 s 30000000 ns ticks: 30 , period 0 s 100000000 ns ticks 100 
thread id 9
FIFO thread: prio 120
thread id 10
Result: sporadic 1 1074 ms sporadic 2 19 FIFO 8816 ms

Discussion:

JanStaschulat commented 3 years ago

Will this bug be fixed?

patacongo commented 3 years ago

Will this bug be fixed?

Apache projects do not have the kind of project organization that can answer that question. The bug will be fixed if some individual in the community decides to work on it as a contribution. No one is offering that now.

As a starting point, I will clean up your test example and incorporate it into the OS test. It found an important bug so it is of value and should be a part of the test. I'll also create some sporadic configuration to exercise the test and replicate your bug.

patacongo commented 3 years ago

I have incorporated a modified version of your test case into the OS test. The is #apache/incubator-nuttx/3097 and #apache/incubator-nuttx-apps/620

Here is some sample output (using your priorities):

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
  1 Sporadic 1 budget 000000000 ns  58438 ms
    Sporadic 2 budget 030000000 ns  41757 ms
  2 Sporadic 1 budget 010000000 ns  58449 ms (essentially the same as a budget of zero).
    Sporadic 2 budget 030000000 ns  41747 ms
  3 Sporadic 1 budget 020000000 ns  91854 ms
    Sporadic 2 budget 030000000 ns   8352 ms
  4 Sporadic 1 budget 030000000 ns 100208 ms
    Sporadic 2 budget 030000000 ns      0 ms
  5 Sporadic 1 budget 040000000 ns   8451 ms
    Sporadic 2 budget 030000000 ns  91755 ms
  6 Sporadic 1 budget 050000000 ns  58417 ms
    Sporadic 2 budget 030000000 ns  41779 ms

NOTE:

  1. These values are very consistent from run to run in my current setup but probably differ in other situations.

  2. Budget values above 50 MS would exceed the maximum of half of the replenishment interval and would not be expected to work with any accuracy.

  3. Although there are some failures, in general it looks better than the values that you reported above. The only functional difference (with the above test) is that I did remove the FIFO nuisance thread that you claimed was not necessary.

Each test case is 100,000 MS total. Expected results:

    BUDGETS                    EXPECTED       ACTUAL  RESULT
1.  sporadic 1 budget   0% :   >=       0 MS  58438   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41757   OK
2.  sporadic 1 budget  10% :   >=  10,000 MS  58449   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41747   OK
3.  sporadic 1 budget  20% :   >=  20,000 MS  91854   OK
    sporadic 2 budget  30% :   >=  30,000 MS   8352   FAIL!!!
4.  sporadic 1 budget  30% :   >=  30,000 MS 100208   OK (but used ALL of the interval)
    sporadic 2 budget  30% :   >=  30,000 MS      0   FAIL!!!
5.  sporadic 1 budget  40% :   >=  40,000 MS   8451   FAIL!!!
    sporadic 2 budget  30% :   >=  30,000 MS  91755   OK
6.  sporadic 1 budget  50% :   >=  50,000 MS  58417   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41779   OK

I believe that this may be largely an artifact of the identical priorities for the two sporadic threads. Consider this priority change:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 170, low 30, repl 100000000 ns
  1 Sporadic 1 budget 000000000 ns   8348 ms
    Sporadic 2 budget 030000000 ns  91853 ms
  2 Sporadic 1 budget 010000000 ns  16707 ms
    Sporadic 2 budget 030000000 ns  83495 ms
  3 Sporadic 1 budget 020000000 ns  25064 ms
    Sporadic 2 budget 030000000 ns  75142 ms
  4 Sporadic 1 budget 030000000 ns  33422 ms
    Sporadic 2 budget 030000000 ns  66785 ms
  5 Sporadic 1 budget 040000000 ns  41777 ms
    Sporadic 2 budget 030000000 ns  58429 ms
  6 Sporadic 1 budget 050000000 ns  50125 ms
    Sporadic 2 budget 030000000 ns  50081 ms

Expected results:

    BUDGETS                    EXPECTED       ACTUAL    RESULT
1.  sporadic 1 budget   0% :   >=       0 MS   8348 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  91853 MS  OK
2.  sporadic 1 budget  10% :   >=  10,000 MS  16707 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  83495 MS  OK
3.  sporadic 1 budget  20% :   >=  20,000 MS  25064 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  75142 MS  OK
4.  sporadic 1 budget  30% :   >=  30,000 MS  33422 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  66785 MS  OK
5.  sporadic 1 budget  40% :   >=  40,000 MS  41777 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  58429 MS  OK
6.  sporadic 1 budget  50% :   >=  50,000 MS  50125 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  50081 MS  OK

The fact that this priority change eliminates the problem still suggests to me that that there is some issue but that just is more subtle than it originally appeared. Some of this is misleading too: By raising thread 2's lower priority to 30, it always runs for most of the replenishment interval. It would be better to have a CPU hog FIFO thread at a priority of about 100. Then neither sporadic thread could run in its lower priority state and we should then see the counts only for the sporadic threads when they are in the higher priority state.

JanStaschulat commented 3 years ago

@patacongo thanks for including it in the os-tests.

Yes, I agree, there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles. Proposed setup:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
FIFO      : prio 100,  (busy loop, which does computation all the time)

Then, a sporadic thread with a budget of e.g. 30 % shall also result in about 30% processing time, and not any value above 30%. I think with this setup you can properly verify the correctness of the sporadic server scheduling algorithm.

patacongo commented 3 years ago

@patacongo thanks for including it in the os-tests.

Yes, I agree, there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles. Proposed setup:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
FIFO      : prio 100,  (busy loop, which does computation all the time)

Then, a sporadic thread with a budget of e.g. 30 % shall also result in about 30% processing time, and not any value above 30%. I think with this setup you can properly verify the correctness of the sporadic server scheduling algorithm.

I did this in a different way: I added two counts, one when the priority is high and one when the priority is low. The high priority count should be equal to the budget. Low priority counts will occur when the CPU is IDLE and has nothing else to do.

Now, I can see the problem more clearly. I will edit this comment and report the results in a few minutes. ... Here are the results of the modified test:

user_main: Dual sporadic thread test Sporadic 1: prio high 180, low 20, repl 100000000 ns Sporadic 2: prio high 170, low 30, repl 100000000 ns

        THREAD    BUDGET  HI MS  LO MS
  1 Sporadic 1 000000000   8344      0
    Sporadic 2 030000000  41757  50092
  2 Sporadic 1 010000000  16706      0
    Sporadic 2 030000000  41750  41742
  3 Sporadic 1 020000000  25063      0
    Sporadic 2 030000000   8352  66786
  4 Sporadic 1 030000000  33421      0
    Sporadic 2 030000000      0  66782
  5 Sporadic 1 040000000  41775      0
    Sporadic 2 030000000      0  58426
  6 Sporadic 1 050000000  50123      0
    Sporadic 2 030000000      0  50079

No you can see that the behavior is the same as your original report: The higher priority budget interval is does not occur after thread 1 budget equals or exceeds the thread 2 budget.

The modified test is incubator-nuttx-apps PR 623

patacongo commented 3 years ago

PR #3111 corrects some of the problems, but not all:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 170, low 30, repl 100000000 ns

        THREAD    BUDGET  HI MS  LO MS
  1 Sporadic 1 000000000   8342      0
    Sporadic 2 030000000  41749  50095
  2 Sporadic 1 010000000  16699      0
    Sporadic 2 030000000  41745  41742
  3 Sporadic 1 020000000  25056      0
    Sporadic 2 030000000   8351  66784
  4 Sporadic 1 030000000  33413      0
    Sporadic 2 030000000      0  66779
  5 Sporadic 1 040000000  41766      0
    Sporadic 2 030000000  41733  16687
  6 Sporadic 1 050000000  50114      0
    Sporadic 2 030000000  41725   8348

It certainly does narrow the problem down to the case where both thread's budget times complete at approximately the same time.

patacongo commented 3 years ago

I believe that I understand the problem. It is complex to explain.

That is consistent with the condition we see that causes the failure (i.e., with both budget intervals the same) and with the counting that we see in collected data (no high priority counts). But without any data, it is just a fantasy.

A solution would require additional state information to detect the case that thread 2 was not initially running. There is already a sporadic->suspended that is set to true when the thread is started. However, it is reset to false when thread 2 resumes (actually runs for the first time) so that information is lost.

Here is an improved description of the failure scenario:

I am not quite sure how to fix this.

patacongo commented 3 years ago

Today, I planned to add some instrumentation in the form of debug output to a RAM log to analyze this problem. The RAM log is very fast so I did not expect any issues. However, I found that generating a lot of debug output would eliminate the problem. Even generating a small amount debug output caused only some losses in budget.

This is bad in that in that it means there is no simple way to debug the issue. It is good, however, in that it supports the idea that it is a race condition that causes the problem. The primary effect of using the RAM log is very small timing delays.

patacongo commented 3 years ago

there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles.

I have a hunch that this would eliminate the problem seen in the case where both budgets are 30 MS because I think it would eliminate the condition that leads to the race condition. However, that problem is a real issue so it is good for the time being that this test reveals the problem.

JanStaschulat commented 3 years ago

We published a paper using the sporadic scheduler of NuttX in the context of micro-ROS: https://arxiv.org/abs/2105.05590

GooTal commented 12 months ago

Oh, i got a question about this, too.

If a sporadic thread is blocked during its high-priority budget, then wake up during its low-priority, the sporadic thread will just execute at low-priority. But the budget is never consumed during one replenishment interval.

But i think we should let the sporadic continues to run at high priority, if its budget is not really consumed and replenishment time is not yet arrived.

I think the problem might be the watchdog. sporadic_budget_start called watchdog and sporadic_budget_expire then set to low-priority. Then sporadic_interval_start is called and sporadic_interval_expire is called. This means that once the thread is blocked during high-priority, the high priority budget watchdog still consumes the budget.

I come up with an idea that might be useful: Let`s just set the replenishment watch dog. When the replenishment time comes, thread`s budget is replenished no matter how much it is left. Once the sporadic is running, let the tcb->timeslice indicates the budget. For example, replenishment = 5, budget = 2. tcb->timeslice is set to 2 initially. If the 2 budget is consumed, set tcb->timeslice to 0. In this case, if the thread is blocked during high-priority, it still could be rescheduled at a high priority, untill its budget consumed. This would need some modification to the scheduler.

I`m not sure if the modified scheduling can still be called `sporadic schduling`. I`ve also read some other papers, while there is some difference.Here is the list:

1. Scheduling Aperiodic Tasks in Dynamic Priority Systems. This paper described `Dynamic Sporadic  Server`, which is a bit different from nuttx. 2 Aperiodic servers in a deadline scheduling environment. 3 QNX doc. This doc also described sporadic scheduling. 4 Aperiodic Task Scheduling for Real-Time Systems This paper described sporadic scheduling under RM situation i suppose.

@patacongo Thanks.