Closed noahlevenson closed 1 year ago
Hypothesis confirmed: web clients do strange and wildly unpredictable things when computers to go to sleep. Most of the wacky things we've observed are indeed likely related to sleepy computers.
The TLDR: You can't count on any of the mathematical relationships you observe in Honeycomb to be meaningful. There can be odd numbers of open WebSockets, strange multiples of open signaling messages, lots of open WebSockets and no signaling messages (even in the absence of censored peers)... when it's sleepy time, all bets are off.
There doesn't seem to be any meaningful way to reason about churn, because we'll always have computers connecting and disconnecting in unpredictable ways while they're asleep.
It's worth considering what havoc may be wreaked by computers sending signaling messages while they're asleep. At scale, this would be very destructive.
@woodybury and I did a testing session wherein I monitored the logs while he ran a widget on his MacBook in Chrome, putting the computer to sleep under different circumstances. Some things we saw:
On battery power, the screen goes to sleep. We see all 10 WebSocket connections die, but then immediately all 10 of them are re-established. At this time, all the HTTP requests stop completely. A few minutes later, those 10 WebSockets die and never come back.
Asleep on wall power, the Mac behaves similarly, except we also saw it randomly open 2 HTTP requests to Freddie while asleep.
Asleep on wall power, @woodybury sends a text message to himself, with the intention of discovering whether iMessage activity wakes things. The screen remains black, but we observe the widget open 10 WebSockets and 4 of 6 HTTP requests... followed shortly thereafter by the 2 remaining HTTP requests. A few minutes later, the WebSockets disconnect and the HTTP requests stop.
Asleep on wall power, we observe cycling wherein the computer seems to wake up every now and then, and it opens both WebSockets and HTTP requests, then goes away again.
We saw Chrome do other bizarre stuff, killing WebSockets and then reopening them, just based on minimizing or backgrounding the tab.
At least once, we saw a sleeping widget's 10 WebSockets die, and then it only redialed 9!
Chrome seems to honor open WebRTC connections above all. As long as the widget had open WebRTC connections, the computer would go to sleep but leave the widget running and able to proxy. When we disconnect those WebRTC connections, the computer kills the widget, disconnecting the WebSockets and ceasing to create HTTP requests to Freddie.
This is super interesting. We've had similar worries about core lantern in this state in the past. This area popped up in particular for basically marking proxies as blocked based on connections dying when devices are actually just sleeping. In fact, this is still a leading theory for why our blocking detection frequently thinks that servers in completely uncensored regions are blocked, at which point it dutifully replaces them.
On that side, this is a good example of where using our new bypass blocking detection approach is attractive.
One of the coolest takeaways was the as long as there's an active webrtc connection (doesn't even need to be sending/receiving) chrome will keep the tab alive. Similar to the old hack of secretly playing blank audio to keep a tab alive.
I hypothesize that web clients start doing strange things when computers go to sleep.
We observed a situation where @Derekf5 left a web widget (not the extension!) running on his office computer, in Chrome, after he left the office. He surmised that his computer had almost definitely gone to sleep.
This coincided with strange connection counts being reported in Honeycomb: 11 WebSockets, and 6 signaling messages.
We know that given our current concurrency settings, WebSockets should be multiples of 10 and signaling messages should be multiples of 5.
As an experiment, I killed and restarted both of our services, just to see what any connected widgets would do. They immediately created the same 6 signaling messages. But they created 10 WebSockets, disconnected them, and then created 9 WebSockets. Very strange.
To be clear, our servers are designed to tolerate any odd number of connections, and so rogue clients behaving incorrectly aren't a risk to the network. But I've never been able to reproduce these kinds of connection counts, and I wonder if it's just because I don't have a Mac that can go to sleep?
It would be gratifying to understand the source of these weird connection count values when we see them in Honeycomb. We should run an experiment with some of our Mac users.