Open SeongGino opened 1 year ago
It very much seems to come down to a race condition, and was further exacerbated by use of dual cores; hopefully 47a7b7c773c1a30f459dd1eca32dfc3bfab73f82 mitigates this issue, if not fixing outright.
If it doesn't, test with DUAL_CORE
turned off - if it doesn't crash with that off, then parallelization may be causing more problems than solves, at least when serial connections are involved.
Yeah, still ran into issues. So now 8bb26ee7dc89361ef184fda18c6dd41e2d4c705b just adds delays.
Interestingly, the TinyUSB library does allow changing the polling rate. Would we benefit from increasing that to 1ms?
For future reference, a list of things tried/to try:
DUAL_CORE
on its own: CrashesusbHid.setPollInterval(1);
: Crashesdelay(1);
s after serialMode input processing: CrashesusbHid.setPollInterval(1);
& delay(1);
after camera processing block: delay(1);
after camera processing block only: CrashesDUAL_CORE
w/ usbHid.setPollInterval(1);
& delay(1);
after camera processing block: CrashesMain reason I'm hesitant on changing the USB polling rate to 1(ms) is processing overhead on original Cortex-M0's, if they're still a desired target - it works fine on an RP2040 in single threaded mode, which means it's also likely to work on a Cortex-M4. Is this what GUN4IR does?
The commonality with all the tests, is that the device only freezes on the frame an input is sent, and the solenoid is commanded to pull but before it's allowed to rest, causing it to stick (testing this with Haunted Museum for the record).
If that's the case, the only way I know to solve it is to only run the Serial Mode-specific button handling only when irPosUpdateTick
is set, while the serial line isn't busy - so essentially, in serial mode, we limit the button polling from "as fast as the board fires" to the fixed camera update rate of ~209hz.
At least a max polling rate of 209hz is still sending inputs every ~4.8ms. It's a good thing for my sanity that serial handoff has us only handling the buttons. (^^;
Not pushed to the repo, but even with an internal test where the button states are polled separately from the button responses, and we only send the updated button responses once alongside every camera update... it's still crashing.
At this point, I'm at a loss. Either there's an additional place where a delay needs to be placed, there's a fatal bug in the RP2040 core that's somehow gone unreported, or we'll have to can Mamehook support as a headlining feature and put it behind a forever experimental state (which I REALLY don't want to have to do, but figuring this out on my lonesome has been taking its toll and I want to have a new build pushed sooner rather than later so I can just take a break already...).
For what it's worth, we should also check other optimization levels (we use No change in behavior here either.-O3
by default, by Prow's original suggestion), to make sure that isn't somehow causing issues.
I think we might be getting somewhere with 385c90022d629523133b7053b4a804234c1d8128 - alternated between a stage and a half of Operation G.H.O.S.T., two stages of Virtua Cop (Model 2), and a full playthrough (again) of Haunted Museum for good measure on the same go, and the gun has had yet to crash... so far.
It may very well just be linked to the trigger button/mouse click signals - and if that is so, and a 2ms delay isn't noticeable on games that, if we're only talking about MAMEHOOK-compatible games, at best refresh at 16.67ms anyways, this may well be the needed solution!
Or at least, it's a hack that I'm okay with doing if it finishes this damn feature lol.
NOTE: this is with single core, haven't tested dual core yet. Likely that this does fix it there too (since the change is at a subroutine level that both loop()s access) but should check regardless.
None of the delays help, which maybe I was naive to think they would do anything substantial...
Because the issue is the device "sticking" (wherein it is doing SOMETHING, but simply not changing, so sols or motors remain powered - which is very distinctly not a crash), that would mean there's some point where an infinite loop gets stuck.
A quick once-over shows that potential failure points are either in the DFRobotIRPositionEx library: https://github.com/SeongGino/ir-light-gun-plus/blob/24a81139316f8e518afc501db128135336100ff5/libraries/DFRobotIRPositionEx/DFRobotIRPositionEx.cpp#L172-L175
Or in the serial reading: https://github.com/SeongGino/ir-light-gun-plus/blob/24a81139316f8e518afc501db128135336100ff5/SamcoEnhanced/SamcoEnhanced.ino#L1163-L1168
I think I'll try experimenting with these next to remove any more chances of being left in a stalling state.
Investigated the above.
still. locks up.
At least I've narrowed that camera movement doesn't contribute to the problem, so it isn't I2C. Just spamming trigger when not seen still causes it.
So it is down to the timing of inputs being sent vs serial data being read/sent.
Even with the serial and data lines taking turns, it continues to crash.
And judging by https://github.com/adafruit/Adafruit_TinyUSB_Arduino/issues/293, it does in fact seem to be a fault of the USB stack and not the code itself! (thank goodness)
I've done a test with both patches in that issue thread, and it seems to resolve the fault. If it really is down to a deadlock in TinyUSB, then we could either wait until it is resolved upstream, or provide instructions for users to install the "fixed" RP2040 core.
Obviously, tampering with Arduino cores directly like this is another challenge for the less technically adept; but if this issue's shown anything, I have tried literally everything I could on the sketch side to no avail, so this will be what it comes down to.
Blocking issue for mamehook support, but not due to code added from that - it's an issue I'd discovered early on with the original source but never paid it much mind until now.
It's possibly more likely to happen on, but perhaps not exclusive to RP2040. Must investigate.