CE-Programming / toolchain

Toolchain and libraries for C/C++ programming on the TI-84+ CE calculator series
https://ce-programming.github.io/toolchain/index.html
GNU Lesser General Public License v3.0
511 stars 54 forks source link

USB Device getting stuck in .waitDma loop after rapid transfers #482

Open LukeBorowy opened 6 months ago

LukeBorowy commented 6 months ago

When sending and receiving a lot of transfers sequentially, usb_HandleEvents can freeze and never return. This can be demonstrated using the attached project, which is a slightly modified version of the link_library example to send a bunch of transfers. To test this program you will need 2 calculators and the appropriate cable. This only ever happens on the "device" calculator, not the host.

link_library.zip

I added some debugging logs to usbdrvce and recompiled the toolchain to figure out where it was freezing. It is in the _ExecuteDma function, specifically getting stuck in the .waitDma loop. It does exit with an error if you exit on the host calculator, which is kind of weird to me since the device is the one frozen.

I discovered this bug when adding multiplayer support to my game. Normally, I am not sending nearly this much data. However, after a few minutes have passed (anywhere between 1-20), this happens. I believe it has something to do with the exact timing of send and receive transfers finishing, and it just takes a while to get unlucky. It happens very quickly when I spam transfers like this example, since it's much more likely to hit at the bad time. It also occasionally gives me bad/corrupted data on read instead of freezing, but that is harder to reproduce.

Video of the issue: Note when the device stops blinking. Notably, the host seems to think that the transfer of "H" was complete, and that it was now sending "Q". However, the device has frozen before it even returns from reading "H".

https://github.com/CE-Programming/toolchain/assets/28664080/6c7358cc-f82e-4cdc-8f53-b286e09575fa

Hopefully this is just a coding error on my part, but as of now it seems to be in the library.

acagliano commented 3 months ago

Do you know what the status code that happens is? In my issue which I thought I had fixed but apparently didn't, after a lot of sequential transfers all of a sudden something happens (either device or host, not sure) but any subsequent transfers queued up on the endpoint that is handling a lot start sending error code 80 (10100000 binary) and failing immediately. For the record that error is USB_TRANSFER_CANCELED | USB_TRANSFER_BUS_ERROR.

LukeBorowy commented 3 months ago

I don’t have access to calculators to test now, but I’m pretty sure the host didn’t get errors queuing a transfer, and the device didn’t either. For the host it just looked like the transfer was still in progress, not any error. The error that occurred (I think) was in when it was unplugged, at which point the device trying to read (understandably) got 003= USB_TRANSFER_STALLED | USB_TRANSFER_NO_DEVICE.

That’s what makes this so annoying. If the code got any sort of indication of an error when the issue actually happened, I could try to do something to reset the connection to make it respond again. However, the host thinks everything is fine and I can’t do anything on the device since it is frozen, so there’s no way to recover without physically disconnecting them. (I’m pretty sure that I checked for all the statuses on the host, but I can’t confirm that).

I noticed in your linked issue that it only happens with high traffic. In my case, it seems to freeze eventually even with low traffic, leading to my belief that it is something with the precise timings. High traffic just makes it more likely to hit at the “bad” time.

acagliano commented 3 months ago

Thanks for the response; I did source my issue and they are in fact not related; yours is actually in usbdrvce. Mine was me not doing a step in my driver code properly, though for a while there it was presenting as the same issue.