downforacross / downforacross.com

Web frontend for downforacross.com -- continuation of stevenhao/crosswordsio
https://downforacrosscom.downforacross1.now.sh
MIT License
229 stars 91 forks source link

Client regularly losing sync #110

Closed billyjanitsch closed 3 years ago

billyjanitsch commented 4 years ago

Hi! Thanks for your work on this project.

In the past 1-2 days, the client has started to lose sync very frequently (every ~30 seconds). When it happens, the client stops receiving updates. The "Ahead" count at the bottom continues to increment for every local action.

It's fixed by refreshing the client.

stevenhao commented 4 years ago

Thanks for reporting this! I'll look into scaling the backend and setting up amazon elastic load balancer...

stevenhao commented 4 years ago

Actually, it seems like this is a transient issue. I'm running a network-intensive backfill on the same server (probably not a good idea, but 🤷 ...) that is probably interfering with the server's availability. Will reopen if it continues to happen tomorrow.

stevenhao commented 4 years ago

I just experienced this so clearly the issue is still live. Will try to add some monitoring in the next couple of days.

stevenhao commented 4 years ago

I will also look into making socket.io auto-reconnect when it detects that it's disconnected

benjaminjkraft commented 4 years ago

I and some coworkers were seeing this kind of problem a bunch today. In many cases it looks from the console like the websocket is connected, and did save my work, but stopped getting acks or updates back from the server (so it said "n ahead" in the UI, and I didn't see others' updates). Let me know if there's more info that would be helpful -- and thanks for the site!

Edit: ah, eventually it does retry. But then it logs Uncaught TypeError: window.socket.close().then is not a function and further updates still fail. This is in Firefox.

stevenhao commented 4 years ago

Ah, that's a silly bug, should be window.socket.close().open()

stevenhao commented 4 years ago

Hopefully the combination of https://github.com/downforacross/downforacross.com/commit/14489350c96934caf463227d29e4507b4ba61c24 and https://github.com/downforacross/downforacross.com/commit/596c2440ec992155642b9bea99493baf4635d3e5 should mitigate this issue. I will continue looking for the root cause (smells like some kind of memory / open socket leak in the server code?) but hopefully users can have an undisrupted experience.

benjaminjkraft commented 4 years ago

Things were a little better today. It seemed like the reconnections were working sometimes, but they don't seem to try to sync lost state so 50% working reconnection is still pretty annoying.

If it's useful, I was seeing this on puzzle 450136 around 13:05 PDT today. In the console I was seeing it claim to reconnect, but it wouldn't actually start working.

sneeper commented 4 years ago

This bug was pretty bad tonight - constantly getting out of sync and removing updates once it was in sync.

stevenhao commented 4 years ago

OK, I'm going to further mitigate the issues by changing reconnect logic to:

stevenhao commented 4 years ago

Thanks all for the help around reporting on this issue. If you are experiencing this regularly and want to further help the investigation, 1) Set socket.io-client's debug logs to be max verbositylocalStorage.debug = '*' 2) [any combination of]: Screen recording, screenshot of logs, dump of logs

benjaminjkraft commented 3 years ago

Sorry for the delay in reporting as much, but we've been having a lot fewer problems with this the last few weeks. Thanks for whatever you did to fix things and I'll report as you describe if we have any more!

stevenhao commented 3 years ago

Ah right. I cleaned up a variety of things related to the socket traffic & database operations. Don't know exactly what did the trick but it seems to be resolved now :).