GaloisInc / BESSPIN-Tool-Suite

The core tool of the BESSPIN Framework.
Other
6 stars 2 forks source link

Detect Teensy failure #1202

Closed podhrmic closed 3 years ago

podhrmic commented 3 years ago

We can detect Teensy failure in the ECU - it manifests by the raw values being zeros. For example:

50:43:16.715 (prvInfoTask:raw) throttle: 0, brake: 0
50:43:16.719 (prvInfoTask:scaled) Gear: D, throttle: 0, brake: 100
50:43:16.725 (prvInfoTask:hz) prvSensorTask: 19[Hz]

Because the brake channel is inverted, raw read of 0 means 100% brake. The ECU can detect it, and alert the Admin PC to take action.

I suggest sending CMD_COMPONENT_ERROR(ID_TEENSY) or CMD_COMPONENT_RESTART(ID_TEENSY) message from the ECU, but on the UDP CAN bus - that way we don't have to modify the ECU code to handle TCP connections.

@dmzimmerman @EthanJamesLew Would that be a reasonable solution? I can implement handling of such message in Admin PC, but wanted to check first.

dmzimmerman commented 3 years ago

This sounds like a reasonable solution to me. Certainly much nicer than making the ECU talk TCP.

dmzimmerman commented 3 years ago

We need to figure out some way to make sure that the Admin PC isn't repeatedly told to kick a dead Teensy, by virtue of the ECU repeatedly getting 0 for raw values... but also can't rely on a single UDP message to get through... because you could easily imagine a cycle of "Teensy dies... Admin PC told to kick it... Teensy comes back up but Admin PC gets a UDP packet telling it to kick it, so it kicks it again... ECU sees dead Teensy while it's being kicked and sends yet another UDP packet to kick it... etc."

podhrmic commented 3 years ago

This should be done on the ignition side, so we don't need to modify FreeRTOS anymore

dmzimmerman commented 3 years ago

Do which part on the ignition side? I didn't think the ECU propagated the raw data values, so the error can't be detected on the ignition side... and the repeated kicking, I think, is hard to avoid without some kind of sequence numbering. It could be done in a very rough way with a cooldown time after reset due to CMD_COMPONENT_ERROR (or RESTART), I suppose.

podhrmic commented 3 years ago

I am just trying to minimize the amount of changes FreeRTOS needs. But I guess sending CMD_COMPONENT_ERRORmessage iff there is a teensy failure is minimal enough. The admin PC can then deal with the rest (and have some fancy logic as well).

dmzimmerman commented 3 years ago

Understood. I think the detection does have to happen on the FreeRTOS side, but it can be dumb about how it deals with it (just send the error message after every bad sensor read or the like) and we can try to deal with it in an appropriate way on the admin PC side.

podhrmic commented 3 years ago

Fixed in #1274