SeongGino / ir-light-gun-plus

Arduino powered IR light gun - with force-feedback additions, MAMEHOOKER support, quality of life changes, and (possibly) dubious code quality!
GNU Lesser General Public License v2.1
27 stars 3 forks source link

TinyUSB - Chance of RP2040 crashing when Serial communications are happening while input data is sent #3

Open SeongGino opened 10 months ago

SeongGino commented 10 months ago

Blocking issue for mamehook support, but not due to code added from that - it's an issue I'd discovered early on with the original source but never paid it much mind until now.

It's possibly more likely to happen on, but perhaps not exclusive to RP2040. Must investigate.

SeongGino commented 10 months ago

It very much seems to come down to a race condition, and was further exacerbated by use of dual cores; hopefully 47a7b7c773c1a30f459dd1eca32dfc3bfab73f82 mitigates this issue, if not fixing outright.

If it doesn't, test with DUAL_CORE turned off - if it doesn't crash with that off, then parallelization may be causing more problems than solves, at least when serial connections are involved.

Yeah, still ran into issues. So now 8bb26ee7dc89361ef184fda18c6dd41e2d4c705b just adds delays.

Interestingly, the TinyUSB library does allow changing the polling rate. Would we benefit from increasing that to 1ms?

SeongGino commented 10 months ago

For future reference, a list of things tried/to try:

Main reason I'm hesitant on changing the USB polling rate to 1(ms) is processing overhead on original Cortex-M0's, if they're still a desired target - it works fine on an RP2040 in single threaded mode, which means it's also likely to work on a Cortex-M4. Is this what GUN4IR does?

SeongGino commented 10 months ago

The commonality with all the tests, is that the device only freezes on the frame an input is sent, and the solenoid is commanded to pull but before it's allowed to rest, causing it to stick (testing this with Haunted Museum for the record).

If that's the case, the only way I know to solve it is to only run the Serial Mode-specific button handling only when irPosUpdateTick is set, while the serial line isn't busy - so essentially, in serial mode, we limit the button polling from "as fast as the board fires" to the fixed camera update rate of ~209hz.

At least a max polling rate of 209hz is still sending inputs every ~4.8ms. It's a good thing for my sanity that serial handoff has us only handling the buttons. (^^;

SeongGino commented 10 months ago

Not pushed to the repo, but even with an internal test where the button states are polled separately from the button responses, and we only send the updated button responses once alongside every camera update... it's still crashing.

At this point, I'm at a loss. Either there's an additional place where a delay needs to be placed, there's a fatal bug in the RP2040 core that's somehow gone unreported, or we'll have to can Mamehook support as a headlining feature and put it behind a forever experimental state (which I REALLY don't want to have to do, but figuring this out on my lonesome has been taking its toll and I want to have a new build pushed sooner rather than later so I can just take a break already...).

For what it's worth, we should also check other optimization levels (we use -O3 by default, by Prow's original suggestion), to make sure that isn't somehow causing issues. No change in behavior here either.

SeongGino commented 10 months ago

I think we might be getting somewhere with 385c90022d629523133b7053b4a804234c1d8128 - alternated between a stage and a half of Operation G.H.O.S.T., two stages of Virtua Cop (Model 2), and a full playthrough (again) of Haunted Museum for good measure on the same go, and the gun has had yet to crash... so far.

It may very well just be linked to the trigger button/mouse click signals - and if that is so, and a 2ms delay isn't noticeable on games that, if we're only talking about MAMEHOOK-compatible games, at best refresh at 16.67ms anyways, this may well be the needed solution!

Or at least, it's a hack that I'm okay with doing if it finishes this damn feature lol.

NOTE: this is with single core, haven't tested dual core yet. Likely that this does fix it there too (since the change is at a subroutine level that both loop()s access) but should check regardless.

SeongGino commented 10 months ago

None of the delays help, which maybe I was naive to think they would do anything substantial...

Because the issue is the device "sticking" (wherein it is doing SOMETHING, but simply not changing, so sols or motors remain powered - which is very distinctly not a crash), that would mean there's some point where an infinite loop gets stuck.

A quick once-over shows that potential failure points are either in the DFRobotIRPositionEx library: https://github.com/SeongGino/ir-light-gun-plus/blob/24a81139316f8e518afc501db128135336100ff5/libraries/DFRobotIRPositionEx/DFRobotIRPositionEx.cpp#L172-L175

Or in the serial reading: https://github.com/SeongGino/ir-light-gun-plus/blob/24a81139316f8e518afc501db128135336100ff5/SamcoEnhanced/SamcoEnhanced.ino#L1163-L1168

I think I'll try experimenting with these next to remove any more chances of being left in a stalling state.

SeongGino commented 10 months ago

Investigated the above.

still. locks up.

SeongGino commented 10 months ago

At least I've narrowed that camera movement doesn't contribute to the problem, so it isn't I2C. Just spamming trigger when not seen still causes it.

So it is down to the timing of inputs being sent vs serial data being read/sent.

SeongGino commented 10 months ago

Even with the serial and data lines taking turns, it continues to crash.

And judging by https://github.com/adafruit/Adafruit_TinyUSB_Arduino/issues/293, it does in fact seem to be a fault of the USB stack and not the code itself! (thank goodness)

I've done a test with both patches in that issue thread, and it seems to resolve the fault. If it really is down to a deadlock in TinyUSB, then we could either wait until it is resolved upstream, or provide instructions for users to install the "fixed" RP2040 core.

Obviously, tampering with Arduino cores directly like this is another challenge for the less technically adept; but if this issue's shown anything, I have tried literally everything I could on the sketch side to no avail, so this will be what it comes down to.