contiki-os / contiki

The official git repository for Contiki, the open source OS for the Internet of Things
http://www.contiki-os.org/
Other
3.71k stars 2.58k forks source link

TSCH on Atmega256RFR2 - EB synchronization fail #2212

Open AdmiralWurst opened 7 years ago

AdmiralWurst commented 7 years ago

Hello,

I am trying to enable TSCH with Atmega256RFR2 I followed the advice in Thread https://github.com/contiki-os/contiki/issues/1517 (comment from: simonduq on 23 Feb 2016) and set the following variables to get a minimal working example.

define TSCH_SCHEDULE_CONF_DEFAULT_LENGTH 1

define TSCH_CONF_DEFAULT_HOPPING_SEQUENCE TSCH_HOPPING_SEQUENCE_1_1

define TSCH_CONF_EB_PERIOD (1 * CLOCK_SECOND)

In my minimal example i have two nodes where one is the coordinator, which sends an EB every second. The other one listens and after a while catches an EB and joins the network. This is the output i am getting from the child node.

TSCH: parse_eb: no schedule, setting up minimal schedule TSCH-schedule: add_slotframe 0 1 TSCH-schedule: add_link 0 15 1 0 0 255 TSCH: update time source: 0 -> 0 TSCH: association done, sec 0, PAN ID abcd, asn-0.35c, jp 1, timeslot id 0, hopping id 0, slotframe len 0 with 0 links, from 5a:ea:00:ff:ff:2e:21:00 Never-used stack > 17860 bytes TSCH:! ASN drifted by 9, leaving the network TSCH: leaving the network TSCH: {asn-0.6b6 link-0-1-0-0 ch-20} bc-0-0 37 rx 0, dr 6, edr -6 TSCH: update time source: 0 -> 0 TSCH:! not associated, drop outgoing packet TSCH: association: received packet (37 bytes) on channel 20 TSCH: parse_eb: no schedule, setting up minimal schedule TSCH-schedule: remove_link 0 15 0 0 255 TSCH-schedule: remove slotframe 0 1 TSCH-schedule: add_slotframe 0 1 TSCH-schedule: add_link 0 15 1 0 0 255 TSCH: update time source: 0 -> 0 TSCH: association done, sec 0, PAN ID abcd, asn-0.d5b, jp 1, timeslot id 0, hopping id 0, slotframe len 0 with 0 links, from 5a:ea:00:ff:ff:2e:21:00 TSCH:! ASN drifted by 2, leaving the network TSCH: leaving the network TSCH: {asn-0.e18 link-0-1-0-0 ch-20} bc-0-0 37 rx 0, dr 67, edr -67 TSCH: update time source: 0 -> 0 TSCH: association: received packet (0 bytes) on channel 20 TSCH:! parse_eb: failed to parse frame TSCH:! failed to parse EB (len 0)

The output means that the following happens twice

My Question is: How can the ASNs get out of sync so quickly? Is this a problem with the clock? Any different ideas?

Thank you, Peter

atiselsts commented 7 years ago

Hi, Atmega256RFR2 is not in the list of officially supported platforms in core/net/mac/tsch/README.md. (Or indeed any other Atmel platforms, although there is a TSCH stable version for RF230 radios outside the mainline Contiki.)

You don't say which branch you're using. Mainline Contiki is missing the US_TO_RTIMERTICKS macros for Atmel needed to even compile TSCH.

There is some work done by @herjulf on porting TSCH to Atmega256RFR2: https://github.com/herjulf/contiki/tree/TSCH-MSC In that branch we got the broadcast (EB) synchronization to work stable, but there were still some problems with unicast. If you can join in and help I'm sure that would be appreciated.

herjulf commented 7 years ago

Right.

Yes I've been working on other things for a while but still some work for Atmel radios. set/get and some clean up HW crypto. etc for Atmel radio to have API and functions as as other radios. So the step to TSCH should now be much smaller.

I also implmented the Atmel MAC HW symbol counter to be used with rtimer. TSCH runs in interrupt conttext (not good really). The MAC sym counter was a good idea I thought. It is actully an radio interrupt. and messes up with TSCH handling. I really whished there was something like a softirq in Contiki so we could have irq processing minimal still availble and still have tasks with priority like TSCH to preempt userland processes. Maybe it's not that difficult...

Feel free to attack it bu I wonder whats is the best starting poinf now...

simonduq commented 7 years ago

@herjulf I agree TSCH spends way too much time in interrupt context. A softirq would be great. Regardless, with its timing requirements, I believe TSCH has to be interrupt-driven -- but it should always return quick and never busy-wait from interrupt context. The only challenge in making this happen IMO is at the radio API level, where we need asynchronous I/O primitives and radio interrupt callbacks.

herjulf commented 7 years ago

Can't this be done? Say we have this softirq process which scheduled before the ordinary Contiki "processes". In linux there is bit mask with different softirqs in Contiki TSCH could be one softirq. When scheduling /rtimer sleep exits:. rather than executing the work in the irq context it could mark appropriate soffirq bit and return from the interrupt and let the softirq(s) run. The softirqs are scheduled from interrupts typically via rtimer if we think TSCH. And before exiting a softirq it can be maked for reschedule execute again. In Linux we added also a hybrid polling as we had problems with high packet rates simply preempting all other tasks forcing a DOS attack. The trick here was to disable network interrupts and have them handled at softirq. Giving polling and fairness also. This framework was called NAPI and is still used for Linux network drivers. NAPI from 2001 softirq are older.

simonduq commented 7 years ago

Right, and that is easier to do than what I was describing, as the TSCH code and radio API would remain unchanged. We should definitely consider. Thanks!

herjulf commented 7 years ago

Yes. I would think of TSCH as an own softirq or as part of NET softirq. In linux the function name is do_sofirq(). It is has to be scheduled in before any Contiki process and after the return from an interrupt at least for the rtimer interrupt if only think about TSCH to start with. Havn't hacked the Contiki processes so I don't about the challenges involved here.

herjulf commented 7 years ago

To my understanding the challenge is to run softirq's after HW interrupts and not wait until a process returns control to scheduler. One possible idea could be to use software interrupts. To run the softirq's in software interrupts so HW interrupts can preempt. I would think most MCU has this capability or can emulate?

simonduq commented 7 years ago

Regardless of how we do this, I feel we should avoid busy-waiting in TSCH. Even if we move from ISR context to softirq, we're still higher priority than processes, so we should minimize time spent there.

herjulf commented 7 years ago

Absolutely.