Need a timeslot for SPI operation

codezork commented 7 years ago

Hi. Is it possible to do some SPI operations in a defined timeslot without disturbing the mesh radio? I would like to read a sensor every 0.001 sec and send the average readings every 5 sec over the mesh. I try it on a nrf51 so I have no easy DMA in a SPI master config and cannot use the ppi. My try to only read data during ble_radio_notification_evt_handler shows radio_active == false leads to a reset after some attempts to send data over the mesh. Somehow the mesh logic resets when it sees no time is left for advertising? Do you have a hint for me?

trond-snekvik commented 7 years ago

Hi,

This is the problem timeslot_stop() intends to solve. The mesh does not use the Softdevice for radio operation, and the ble_radio_notification_evt_handler has no relation to the mesh radio usage.

The timeslot is the timing critical element to the mesh. If it is not ended in time, the Softdevice will trigger a hardfault, which could be the source of your problem. If you lock IRQs or otherwise block the TIMER0 IRQ for more than 100us, you will eventually get a hardfault. The solution is to end the timeslot before you do your operation (or otherwise guarantee that it won't run out while you block).

timeslot_stop() is exposed in the rbc_mesh API through the rbc_mesh_disable() function. It will end the timeslot without ordering a new one, and should allow you to work freely without breaking the mesh until you call rbc_mesh_enable(). Note that if you have any Softdevice activity running, this will not halt that - you have to handle that in your application. If there is no way for you to do this reliably, look into the mesh_flash module, which operates in the highest priority, interleaved with radio and timer operations.

codezork commented 7 years ago

Hi, unfortunatly this doesn't work at this speed. I'm still working on the ping-pong-throughput branch, so I tried rbc_mesh_stop and -start. The restart of the mesh takes a bit of time. And toggling the mesh on and off leads even more often to a HardFault. I try to measure the actual current from a mains line. Even a rate of 0.005 sec for every async SPI master rx operation leads to a reset. The operation should take less than 0,000005 sec to read 16 bits at 4 MHz. Am I wrong? I just need a timeslot ever 0.01 sec for this duration. I adapted your code to SDK 11 and am using the S130 Softdevice. Without the SPI operation it works fine. The timer and SPI work at APP_IRQ_PRIORITY_LOW. I could not get the trick from the mesh_flash_module. Another hint would be useful. Thanks in advance.

trond-snekvik commented 7 years ago

Alright, sounds like we have to bring in something heavier.

Your assumption on the SPI read time seems reasonable. The 5us operation time should in theory be more than short enough to allow an IRQ lock without inducing any crashes in the mesh, so you could attempt a simple IRQ lock instead of the enable/disable.

The mesh_flash module is called by the timeslot signal handler, and will check how much time is left in the timeslot, then only execute the operations it knows it can fit. I thought a similar mechanism could be used for your problem, but I now realize I might be wrong.

What are the requirements to response time on your 1ms interval timeout? You might get a problem with delays in APP_IRQ_PRIORITY_LOW, as this is the context the mesh uses for packet processing and state maintenance. If you don't call the Softdevice in your IRQ handlers, I recommend moving it up to APP_HIGH, as the mesh might block you out for milliseconds at a time.

The mesh also does some work in the highest IRQ priority, but this is limited to simple state changes, and shouldn't block you for more than 50us at a time.

codezork commented 7 years ago

It looks like that it is not possible to use nrf_drv_spi_transfer due to IRQ conflicts during mesh activities. I tried to strip down the spi finished_transfer callback to the minimum to exclude possible side effects. After some readings a HardFault happens if the timing is bad. In order to measure current flow with the sensor reliable, I must have at least 5 readings every 0.01 sec without significant delay. APP_IRQ_PRIORITY_HIGH doesn't help. For the time being I were happy not to get the HardFault while getting unprecise readings. I also tried to disable interrupts of TIMER0 while reading NVIC_DisableIRQ(TIMER0_IRQn). Seems to make it work longer. But not forever. The frequency of reading has no influence on how often the crash happens. I even tried to read 100 times every 0.01 sec. Works, but crashes when timing gets unsuitable after some seconds.

codezork commented 7 years ago

It turned out that the hard fault does no come from the reading but from the excessive use of notifications to the host via sd_ble_gatts_hvx. Doing this frequently causes the SD 130 to crash at address 0x104FA. Sounds like not ending a timeslot in time. See https://devzone.nordicsemi.com/question/105068/got-nrf_fault_id_sd_assert-on-s130-while-doing-sd_ble_gatts_hvx/

NordicPlayground / nRF51-ble-bcast-mesh

Need a timeslot for SPI operation #137