Add error handling for SPI code

PatrickKa commented 11 months ago

Description

As discussed in #178 the SPI communication will keep using blocking functions. However, we still want to be able to detect timeouts. Since I don't know of any way to interrupt a blocking function, we can't just add a timeout to the SPI functions directly. The best solution I could come up with is an additional thread that watches the SPI communication and checks if the data transfers are taking too long. This also requires some global state that stores if a transfer is in progress and when it should finish. What I am still not sure about is what exactly to do when a timeout error occurs. An error counter should be updated in any case, but we cannot interrupt or cancel the SPI function, so whatever thread called it might get stuck indefinitely.

Edit: After the discussions in person, online, and in the comments below, I turn this into an Epic, to track all things directly related to our SPI error handling concept.

To do

[x] #265
[x] #266
[x] #267
[x] #268
[ ] #305

PatrickKa commented 11 months ago

Blocked by #38

PatrickKa commented 10 months ago

David and I thought about the whole thing again. We documented it in miro with some nice flowcharts. Basically, the supervisor thread increments error counters and resets. In addition, we also have threads that perform startup tests. These tests can disable the whole FRAM/EPS/flash communication. The rest of the code has to adapt to that. If the FRAM does not work, we just store everything in RAM. If the flash does not work, we cannot use the EDU. If the RF does not work, we are pretty much doomed, so we always reset.

SpaceTeam / STS1_COBC_SW

Add error handling for SPI code #237

Description

To do