Team-Silver-Sphere / SquadJS

Squad Server Script Framework
Boost Software License 1.0
166 stars 125 forks source link

Improve RCON packet Parsing and Cleanup on connection close to prevent deadlocks #267

Closed ect0s closed 10 months ago

ect0s commented 1 year ago

Restructure Packet Decode Add counter for packets/callbacks onClose() cleanups unknown packet should call onClose().

This isn't production ready, but I felt the need to pull request it and solicit feedback.

This is primarily to fix the issues as noted in https://github.com/Team-Silver-Sphere/SquadJS/issues/200 and https://github.com/Team-Silver-Sphere/SquadJS/issues/249

This should run as is, I have tested to above 200k RCON command requests, and across server restarts. Before these changes I could cause an error within the first few thousand requests.

The error handling, in particular, could be improved, but I was trying to keep the code broadly similar to the current release.

Longer term, a rewrite might incorporate the new 'count/id' field/callback tracking/retries into a single object, thus bringing some relevant code together

lbzepoqo commented 1 year ago

I've been using these changes without restarting my SquadJS and haven't encountered a single unknown packet since then.

Before, we were experiencing an issue where Admin Requests plugin, JetDave's map vote and Whitelister, along with other commands doesn't get read by SquadJS, but the process is still running and broadcasts were running fine. As I was investigating recently, we usually get 5 or more unknown packets within a single day and this is just when we notice that things doesn't work any more than we restart. As I discovered process.abort(), I included it in the code to crash our SquadJS instead of me manually restarting SquadJS and actively monitoring all day. In a span of 7-8 hours of active time, we get around 5+ of these unknown packets per day, and we had one even if the server wasn't full yet at 6/100.

My first encounter with this issue can be found in a Discord message here. Since then, we've been running SquadJS to restart every ONE hour as suggested by Forrest Gump who also had an instance in Southnode where the SquadJS is located on a different server and is using FTP. They haven't experienced this when they went dedicated. I even requested for JetDave's AutomaticSubsystemRestarter as a workaround.

Also to mention, when an unknown packet appears in our log, the DiscordSubsystemRestarter is enough to fix the issue but still we had to do it manually. The problem with this remedy is that it doesn't restore the functionality of plugins using the websocket (for example: Whitelister's !profile command). There's also an issue where using the DiscordSubsystemRestarter or JetDave's AutomaticSubsystemRestarter sometimes causes this issue #262 which lead me to the option to just abort the process once the issue occurs.

I've been running with this for almost 3 days straight already without any unknown packets and restarts so far. I had to restart SquadJS earlier due to a bug fix for the map vote and will continue to test it and run it the whole week.

This is the first time we reached more than 24 hours without restarting and any issues. It's so weird to look at our directory only containing one log file.

ect0s commented 1 year ago

Small update above;

Found that commands run on timers (such as the updatePlayer lists etc) can sometimes run before we have been authorized, added a flag.

I've let this run for 3 days, including during/over several restarts (including a multihour restart from a server migration).

So far, no issues.

werewolfboy13 commented 10 months ago

@fantinodavide can you verify that I pushed this correctly?

steelskillet commented 10 months ago

That did not work correctly. Please revert.

fantinodavide commented 10 months ago

@werewolfboy13 lots of changes after the merge and even if it works, all the new changes made by ect0s have been removed

werewolfboy13 commented 10 months ago

Broke the merge, new PR to fix. Closing.