dektronics / printalyzer-timer

F-Stop enlarging timer and print exposure meter
24 stars 6 forks source link

USB host implementation needs to be robust against hub IC failures #64

Open dkonigsberg opened 3 months ago

dkonigsberg commented 3 months ago

The most common failure mode of the issues noted in #56 and #60 is that the USB2422 hub IC stops responding on the upstream USB interface. Probing has shown that the USB lines go completely dead when this glitch happens. The current theory is that this is a result of noise on the clock or signal lines breaking the state machine of the hub.

Regardless, toggling the "VBUS detect" pin on the USB2422 seems to cause it to reset its state and start working again. Actually doing this is the easy part. Detecting the issue in firmware, and cleanly handling all the state and race issues around the toggle is a bit more work.

So what needs to happen, is that the firmware needs to be able to detect when this state happens. This could be via a timeout on some periodic hub transaction, or more likely a timeout on something the USB host interface always does when idle (such as an SOF interrupt). Then, when the issue is detected, a robust reset cycle should be implemented. Since its likely that multiple glitch events happen in a row, a glitch during reset shouldn't be allowed to break anything.

Ultimately the hope is that hardware fixes will make the occurrence of this glitch rather rare. But as it cannot be eliminated completely, it should still be handled gracefully.

Testing of this code is best performed on the initial "Rev D" hardware configuration, since that is especially vulnerable to the issue. Future updates to the power board and main board may make it harder to deliberately reproduce.

dkonigsberg commented 3 months ago

The first step of resolving this is handled by the following two commits: 598103fcea51e4cb04459859221a0171e1b0a4ae 1fa5e16ee356f8af803434ec6a69954996ea0347

They solve the issue of the USB layer becoming non-responsive due to the USB2422 freezing up and refusing to respond to USB commands. Further work may be needed to deal with this issue in downstream hubs, particularly handling of various USB device reset (and maybe overcurrent) events that are not currently handled by CherryUSB.