drivers: It is not possible to use transceiver drivers on a shared SPI-bus

PeterKietzmann commented 9 years ago

This issue results from PR #2756.

Problem: Allowing SPI-transfers for example to receive data in interrupt context could lead to unwanted behavior. Assuming there is an ongoing SPI-transfer and at the same time an external interrupt is triggered by the transceiver. This will lead to two enabled chip selects and undefined behavior on the bus.

Former solution approaches: 1) Ensure that the transceiver is the only device on the bus to guaranty exclusive access to the SPI-bus.
2) Remove all SPI-transfers form the ISR and run them in another thread context. 3) Make sure that an interrupt can only be triggered by the radio during "safe" periods (deactivate interrupt during SPI read/write). 4) Enable only one radio interrupt source at the same time 5) Like 2) but additionally increase the priority of a thread that handles the radios and their IRQs

Criticism of the above proposals: 1)

This is not the general idea of a bus
There may be future devices with more than one integrated peripherals on an internal SPI-bus

2)

Could lead to timing difficulties
Time critical actions could lead to problems due to slower and unpredictable task switching time spans
Note: maxACK-timeout is in the range of 700-900us in 2.5Ghz devices (information source?)
Could introduces additional overhead and task switches

3)

Could also introduce timing errors
Can miss deadlines for ACK packets cause it may take too long to re-enable the IRQs

4)

The ISR-handler would have to decide accordingly to the current driver state
For some states there are several possible results

5)

Could introduces additional overhead and task switches
?

PeterKietzmann commented 9 years ago

@jremmert-phytec-iot , @haukepetersen , @gebart would be nice if you can check and extend the description if I missed relevant informations!

PeterKietzmann commented 9 years ago

Should we add the "bug" or "enhancement" label?

jonas-rem commented 9 years ago

I discussed that topic with Johann today. We came to the conclusion that 1) could be the a reasonable solution for devices with an exclusive SPI interface for the radio due to the following reasons:

The KW2x device is a SiP that integrates the radio in the same package as the MCU; there are two seperate SPI, one for periph and one for the radio. In this SiP exclusive access is guaranted. As there is no dedicated radio-module available, we currently do not see the necessity for an other solution that has significant drawbacks.
SPI aquire is not neccessary in this configuration and will be omitted.
Due to exclusive access in hardware (splitted hardware interfaces) the interrupts doesn't need to be disabled during SPI read/write. Only the GPIO-ISR source from the driver will be masked during SPI read/write. We don't see any drawbacks from that.
Potential drawback: This solution will block the SPI-Bus for peripheral reasons other than the radio in future devices with one shared SPI (Radio+Periph).

Maybe @haukepetersen will have more ideas for devices without exclusive access when he is implementing the atmel driver.

Calculation of the maxACK-timeout (This could be an important fact for the final decision): In [1] listed as macAckWaitDuration: macAckWaitDuration = aUnitBackoffPeriod + aTurnaroundTime + phySHRDuration + ceiling ( 6 × phySymbolsPerOctet )

Whereas:

aUnitBackoffPeriod = 20 * Symbol_length (all PHY)
aTurnaroundTime = 12 * Symbol_length (all PHY)
phySHRDuration = Preamble_Duration + SFD_Duration) = (8+2) * Symbol_length (O-QPSK PHY, 250kps)
ceiling (6 x phySymbolsPerOctet) = ceiling(12) = 12
Symbol_length = 16µs (O-QPSK PHY, 250kbps)

macAckWaitDuration = 320µs + 192µs + 160µs + 192µs = 864µs

[1] https://standards.ieee.org/getieee802/download/802.15.4-2011.pdf

PeterKietzmann commented 9 years ago

@jremmert-phytec-iot thanks for the detailed response and the link. In your last point, what do you mean with "block the SPI-Bus for peripheral reasons "? Is it "hardware-blocking" so no other device can use the bus or what was your suggestion? I remember there was a discussion about a radio with an onboard BLE and 802.15.4 transceiver. Unfortunately I forgot the name. This could be a potential candidate for a device with more than one integrated components on one internal SPI-bus. @gebart did you have some hardware in mind when stating this problem?

jonas-rem commented 9 years ago

Oh true, that sentence was confusing. In the suggested method the radio must have exclusive SPI-bus access. Consequently, if using method 1) devices with just one SPI can not handling other peripheral SPI-devices because this SPI is reserved (blocked) for the radio.

jnohlgard commented 9 years ago

@PeterKietzmann I did not have any particular hardware in mind, at least the kw2x radio will work without locking as long as only one single thread is ever used to communicate with it.

The most likely issue that I see is when an SPI transfer is already in progress from thread context and an interrupt occurs and the ISR wants to perform an SPI transfer as well, which may or may not work correctly, depending on the implementation and SPI hardware used.

One example off the top of my head: If using the hardware CS feature of the Kinetis processors the kw2x driver can spinlock inside the ISR until the SPI status register says there is free space in the TX buffer and then perform the transfer, but this will probably break if using software controlled CS from thread context as the CS pin will already be asserted when the ISR wants to assert it, and the radio will likely ignore the command.

Even if using hardware CS it is still a problem to figure out which of the RX bytes belong to which transfer, since the ISR does not know anything about the transfer that was in progress when the interrupt came.

Thinking about this I see only two possible solutions to this:

Never do SPI transfers from interrupt context
or, disable (some) interrupts while performing SPI transfers in thread context. This means that all drivers on the same bus must disable this IRQ, not just the radio driver itself. (think of the case of a radio and a temperature sensor on the same bus, the temperature sensor driver must disable the radio IRQ every time it does a transfer, ugly cross-dependency problems occur)

haukepetersen commented 9 years ago

After looking a little bit into the Atmel AT86RF2xx driver, I think I tend to actually go with solution 2). This makes sure there is no colliding bus access and we don't prevent the use of multiple devices on the bus.

Correct me if I am wrong, but at least for 802.15.4 devices I think all of them support hardware auto-acking, right? So the sending of ACKs we can just leave to the hardware and therefore do not have to worry about timings.

A workaround for other time critical events could be to use a timestamp (brainstorming here): in the interrupt routine save the value of hwtimer_now() into a field in the device descriptor, and once we are in thread context we can at least compensate for any jittering delays.

What do you think?

jonas-rem commented 9 years ago

Ok seems so, then for Ack packets we should be fine. They do not throw an interrupt at all and are handled completely in HW.

I have thought about the time-span e.g. between an incoming Beacon and the outgoing response. As far as I understand, the timing here is not that important. There is even a minimum time spacing time, LIFS (Long interframe Spacing Period), which is 40symbols (640µs).

The timestamps could be essential to enable slottet CSMA. An other possibility beside the hwtimer_now() timestamping could be to use the hardware timestamping, some IEEE802.15.4 radios provide. Since the timestamps are stored in the radios internal registers, they can also be read out after a certain delay.

_But I am not sure if I got everything correct. E.g. what happens if the message queue is partly or complete filled with TX packets from upper layer. As far as I understand, the msg queue is a LIFO? As soon as the mac-layer thread completes the current action (executes code until it reaches msg_recv() the next time), it will process the most recent msg; which is the pending interrupt message. All other network stack related threads running with lower priority than the mac-thread._ For info: The above was partly confusion of mine, partly an design issue when changing the implementation of the csma_mac layer. The message queue in msg.c should be implemented as FIFO. Under normal use cases the MAC layer-thread will not not be swamped with messages. The radio transceiver should be fast enough to send the messages out faster than they are coming in. However if the radio is not fast enough, the mac layer can receive the message and push the message pointer to the packetqueue interface. If this is full upper layer are automatially informed about this. Consequently message dropping should not be necessary.

~~If that summary of my thoughts is correct,~~ I think I am fine ;).

OlegHahm commented 9 years ago

Correct me if I am wrong, but at least for 802.15.4 devices I think all of them support hardware auto-acking, right?

At least if I think of very cheap transceivers, that may support 802.15.4 PHY, but are not necessarily designed for 802.15.4 L2 this may not hold for every device.

haukepetersen commented 9 years ago

Then these devices would have to handle that in their IRQ with all the implications that come with it for their SPI bus usage. It is still possible to do this - we should just try not to do it if not really really needed.

OlegHahm commented 9 years ago

Any final conclusion?

kaspar030 commented 9 years ago

didn't we fix most drivers? @haukepetersen?

haukepetersen commented 9 years ago

IMHO this is solved by (i) coding convention not to use SPI in interrupt context and (ii) the introduction of spi_acquire and spi_release. -> close

RIOT-OS / RIOT

drivers: It is not possible to use transceiver drivers on a shared SPI-bus #2769