Closed FletcherS7 closed 1 month ago
Hmm. I tested a bunch this morning at home but wasn't able to reproduce the issue. I don't have two Linksys APs, but I have one plus a Vivid AP (and a Catalyst switch). I tried the following:
I did once observe that restarting Cheesy Arena while a configure was still in progress got both AP types stuck in a loop. I had to kill CA, power cycle both APs, and wait for them both to come back up before starting CA again.
The boot loop seems to happen when it takes more than a minute or so to commit the match and there are already teams connected to the field for the new match. On clicking match commit the network reloads and one or both APs get stuck in a loop crash loop. the server console shows it is trying to send the no-team-# config at the time. Killing CA, power cycling APs did not fix the issue with one AP, I had to reload the default config on the AP from the webpage then start CA. Worked for a few matches all were committed quickly then there was a 2-3 minute delay before committing. We had 4 of the robots fully connected then about 30 seconds after committing the network reloaded, the switch and APs, this is with your latest fix. The APs both would just crash when CA sent a config to them based off the console log and watching the AP lights.
That's odd; with my fix from yesterday (87b03f27227f3e1a693ed7a841858c6b2e95c79d) there should be no reason (during qual matches; Nexus comes into play otherwise) the network reconfiguration gets triggered during match commit, no matter how long you wait, even if the preload after the previous match doesn't succeed.
My guess would be that there's some other factor at play here sending the APs into the loop.
Running in Single AP mode, the AP got stuck in a crash loop after substituting two teams in playoffs.
It seems to be an issue with trying to reconfigure the APs too soon after the last change with 2-3 clients connected. I think the old way of reloading by loading a test match may have been slower causing less issues. It looks like the reload on commit was causing our issues and there is still something causing the reload to happen when not needed.
My long-term plan to fix this is to put a small REST API on the AP itself to handle configuration and status reporting; it'll be able to better protect against race conditions and multiple simultaneous configuration attempts.
This for the most part was solved by using the API to configure the Linksys instead of SSH. In addition, the move to the VH-113 renders this issue moot.
Follow up to https://github.com/Team254/cheesy-arena/issues/165
Looks like this issues was not present in https://github.com/Team254/cheesy-arena/commit/6171f0dfb84f29edbdf368dd3ef71b8c58825064 (This was the commit we ran all of Capital City Classic quals on)
We only switched to https://github.com/Team254/cheesy-arena/commit/888b8d468b5a66a709748cc197eb8a36ee2bbc94 for play offs at Capital City Classic, So its possible that https://github.com/Team254/cheesy-arena/commit/888b8d468b5a66a709748cc197eb8a36ee2bbc94 was breaking for quals matches.