Team254 / cheesy-arena

An alternative field management system for the FIRST Robotics Competition.
Other
159 stars 69 forks source link

Issues configuring Linksys AP during quals #169

Closed FletcherS7 closed 1 month ago

FletcherS7 commented 1 year ago

Follow up to https://github.com/Team254/cheesy-arena/issues/165

image

Looks like this issues was not present in https://github.com/Team254/cheesy-arena/commit/6171f0dfb84f29edbdf368dd3ef71b8c58825064 (This was the commit we ran all of Capital City Classic quals on)

We only switched to https://github.com/Team254/cheesy-arena/commit/888b8d468b5a66a709748cc197eb8a36ee2bbc94 for play offs at Capital City Classic, So its possible that https://github.com/Team254/cheesy-arena/commit/888b8d468b5a66a709748cc197eb8a36ee2bbc94 was breaking for quals matches.

patfair commented 1 year ago

Hmm. I tested a bunch this morning at home but wasn't able to reproduce the issue. I don't have two Linksys APs, but I have one plus a Vivid AP (and a Catalyst switch). I tried the following:

  1. Linksys AP as only AP
  2. Vivid AP as only AP
  3. Linksys AP in slot 1 and a dummy IP address in slot 2
  4. Dummy IP address in slot 1 and Linksys AP in slot 2
  5. Linksys AP in slot 1 and Vivid AP in slot 2
  6. Vivid AP in slot 1 and Linksys AP in slot 2

I did once observe that restarting Cheesy Arena while a configure was still in progress got both AP types stuck in a loop. I had to kill CA, power cycle both APs, and wait for them both to come back up before starting CA again.

ejordan376 commented 1 year ago

The boot loop seems to happen when it takes more than a minute or so to commit the match and there are already teams connected to the field for the new match. On clicking match commit the network reloads and one or both APs get stuck in a loop crash loop. the server console shows it is trying to send the no-team-# config at the time. Killing CA, power cycling APs did not fix the issue with one AP, I had to reload the default config on the AP from the webpage then start CA. Worked for a few matches all were committed quickly then there was a 2-3 minute delay before committing. We had 4 of the robots fully connected then about 30 seconds after committing the network reloaded, the switch and APs, this is with your latest fix. The APs both would just crash when CA sent a config to them based off the console log and watching the AP lights.

patfair commented 1 year ago

That's odd; with my fix from yesterday (87b03f27227f3e1a693ed7a841858c6b2e95c79d) there should be no reason (during qual matches; Nexus comes into play otherwise) the network reconfiguration gets triggered during match commit, no matter how long you wait, even if the preload after the previous match doesn't succeed.

My guess would be that there's some other factor at play here sending the APs into the loop.

FletcherS7 commented 1 year ago

Running in Single AP mode, the AP got stuck in a crash loop after substituting two teams in playoffs.

ejordan376 commented 1 year ago

It seems to be an issue with trying to reconfigure the APs too soon after the last change with 2-3 clients connected. I think the old way of reloading by loading a test match may have been slower causing less issues. It looks like the reload on commit was causing our issues and there is still something causing the reload to happen when not needed.

patfair commented 1 year ago

My long-term plan to fix this is to put a small REST API on the AP itself to handle configuration and status reporting; it'll be able to better protect against race conditions and multiple simultaneous configuration attempts.

FletcherS7 commented 1 month ago

This for the most part was solved by using the API to configure the Linksys instead of SSH. In addition, the move to the VH-113 renders this issue moot.