contiki-ng / contiki-ng

Contiki-NG: The OS for Next Generation IoT Devices
https://www.contiki-ng.org/
BSD 3-Clause "New" or "Revised" License
1.32k stars 700 forks source link

It is not safe to call process_post in an interrupt handler #764

Open debug-ito opened 5 years ago

debug-ito commented 5 years ago

I'm not sure if this is expected or not, but it is not safe to call process_post in an interrupt handler. This is because it may cause inconsistency in the event queue (in sys/process.c).

The consumer of the event queue is do_event function (run in normal context), and the producer is process_post function. Because operation on fevent and nevents are not atomic in do_event, in an interrupt handler process_post may manipulate the event queue while it's in an inconsistent state. This may cause losing a posted event or dispatching an event that was already dispatched before.

I don't think so many modules call process_post in an interrupt context, but at least the tsch module does (in tsch_disassociate, which may be called in tsch_slot_operation thread) (although that is relatively a rare case).

I think we should either

  1. make the event queue thread-safe, probably by using ringbufindex.
  2. or, add warning to the document of process_post, and fix existing calls to process_post in interrupt contexts. For the tsch case above, we can use process_poll instead.
simonduq commented 5 years ago

Wow, that's a really good point. I think you're right! Agree on both suggested options. A thread-safe event queue would be nice.

debug-ito commented 5 years ago

OK, I'll try the option 1, and make a pull-request.

debug-ito commented 5 years ago

Looks like ringbufindex cannot solve this problem completely.

ringbufindex can handle the race condition between a consumer and a producer, but it doesn't handle the race condition between producers (or consumers.) In our case, we have single consumer (do_event in the main loop) and multiple producers (process_post), and the producers can be in the main loop or in the interrupt handler.

So, what we need here is a multi-producer/single-consumer queue. I'm not sure implementing that is worth the effort.

atiselsts commented 5 years ago

Do you think that putting the lines

fevent = (fevent + 1) % PROCESS_CONF_NUMEVENTS;
--nevents;

in a critical section would create too much run-time overhead? That would solve the problem, wouldn't it?

debug-ito commented 5 years ago

Yes, that would solve the problem, and I'm ok with it. However, I don't know if crtical section is supported by all platforms.

What I was thinking of in the above comment is using a lock-free queue, but maybe using critical section is simpler and portable enough.

debug-ito commented 5 years ago

Oh, and we will have to put a critical section in process_post, too, because process_post can preempt itself.

debug-ito commented 5 years ago

So, can I just use os/sys/critical or os/sys/mutex? Which one is better?

g-oikonomou commented 5 years ago

The intention is for those to work for all supported CPUs, so you can assume them platform-independent. Which to use depends what you want to achieve. critical will manipulate interrupts and apply memory barriers. Mutex will also allow you to safely lock/unlock a resource

debug-ito commented 5 years ago

Thanks for advice. I'll use critical.

damien-v commented 4 years ago

This is documented in the contiki-os process documentation: https://github.com/contiki-os/contiki/wiki/Processes#Polling where it is explicitely stated that the process_poll function is the only one from the process module that can be called from an interrupt service routine. Sorry if I'm a little off-topic here but maybe some of the documentation from Contiki-OS and Protothread could be added to the contiki-ng documentation. When I started using contiki-ng, coming from RTOS habits I was really happy for adam dunkels web site: http://dunkels.com/adam/pt/.

I'm more than happy to take on the task if someone thinks it's a good idea.

alexrayne commented 2 years ago

concurent post messages from ISR, breaks contiky-way - it use polling. this PR try drill a tunel trough this behaviour. imho, some more contiki-native solution could provide some service, just like etimer do: a protothread that posts messages on polling from consumer-ISRs. ISR register demanded messages at this service, and atomicaly marks/fills registered messages, wich it wants post. And service proto-thead post them. Why this aproach not suitable for you ?

debug-ito commented 2 years ago

Sorry @alexrayne , which PR are you talking about by "this PR" in the first line in your comment? And who is "you" in the last line?

alexrayne commented 2 years ago

Sorry @alexrayne , which PR are you talking about by "this PR" in the first line in your comment?

that was about PR#792 - that you closed.

And who is "you" in the last line?

i asks initiator of discussion - @debug-ito. Looks that you solves some actual problem, therefore i asks - does ordinar solution (polling) works for that? Or you see some benefits of of atomic event que, that gives more then penalty of sync?

I have some projects, where wanted posts events from ISR, infered bugs with such posting, and hooks them with crashes. And solve this by polling + bool flags in target process context. IMHO, some trivial API for passing events from ISR would handy. I try imagine, how it can be, and first attempt i got - some what like etimer/ctimer thread, that processes ISR-event descriptors. Or maybe some set of primitives - declarations of events, and functions-testers on activity of them.

debug-ito commented 2 years ago

Thanks for clarificaiton @alexrayne .

The design you described in https://github.com/contiki-ng/contiki-ng/issues/764#issuecomment-985985497 would work. I just thought it would be more handy and safer if you could call process_post freely in ISRs.

As I said in https://github.com/contiki-ng/contiki-ng/issues/764#issue-380950076 , making a thread-safe event queue is just one solution candidate to the issue. You can solve it in another way. It's up to Contiki-NG core team to decide which solution candidate to take.

By the way, PR #792 is still open, but we can close it. That's because the PR was kinda rejected already. As I said in https://github.com/contiki-ng/contiki-ng/pull/792#issuecomment-455001144 , I proposed an alternative solution that consists of three parts.

  1. Add sys/atomic, which provides atomic compare-and-swap (CAS) API. (#825 already merged)
  2. Add a new thread-safe ringbuffer component (#1151 still open)
  3. Re-implement the event queue in sys/process using the new ringbuffer component (Not developed yet)

As for the item 2, I originally thought I could make lib/ringbufindex thread-safe (see #883). However, it turned out it's difficult. So I submitted #1151 instead.

alexrayne commented 6 months ago

hallow @debug-ito I woul propose another solution, that may be interests you - Keil RTX provide threads syncronisation via event flags. it gives very simple api - incteard one boolean poll flag, they allow a set of flags. interrupt can set any of them, appliocation can clear any. So we have some kind of event dispatching on a sigle memory word with atomic and/or. imho contiki can realise similar API easy and more native, vs atomic queues, just enhancing current poll_thread API

debug-ito commented 6 months ago

Thanks for comment, @alexrayne . Unfortunately, I don't have time or energy for Contiki-NG project anymore. I'll leave this issue to Contiki-NG core team (or anyone around here).