SpaceTeam / STS1_COBC_SW

Software for the communication and onboard computer (COBC) of SpaceTeamSat1 (STS1)
MIT License
7 stars 2 forks source link

Add error handling for SPI code #237

Open PatrickKa opened 6 months ago

PatrickKa commented 6 months ago

Description

As discussed in #178 the SPI communication will keep using blocking functions. However, we still want to be able to detect timeouts. Since I don't know of any way to interrupt a blocking function, we can't just add a timeout to the SPI functions directly. The best solution I could come up with is an additional thread that watches the SPI communication and checks if the data transfers are taking too long. This also requires some global state that stores if a transfer is in progress and when it should finish. What I am still not sure about is what exactly to do when a timeout error occurs. An error counter should be updated in any case, but we cannot interrupt or cancel the SPI function, so whatever thread called it might get stuck indefinitely.

Edit: After the discussions in person, online, and in the comments below, I turn this into an Epic, to track all things directly related to our SPI error handling concept.

To do

PatrickKa commented 6 months ago

Blocked by #38

PatrickKa commented 5 months ago

David and I thought about the whole thing again. We documented it in miro with some nice flowcharts. Basically, the supervisor thread increments error counters and resets. In addition, we also have threads that perform startup tests. These tests can disable the whole FRAM/EPS/flash communication. The rest of the code has to adapt to that. If the FRAM does not work, we just store everything in RAM. If the flash does not work, we cannot use the EDU. If the RF does not work, we are pretty much doomed, so we always reset.