getlantern / browsersunbounded

Interoperable browser-based P2P proxies for censorship circumvention
GNU General Public License v3.0
6 stars 0 forks source link

Broflake: investigate NAT traversal failure #163

Open noahlevenson opened 1 year ago

noahlevenson commented 1 year ago

Now that we've got an abundance of censored peers on the network, the problem of NAT traversal has revealed itself to be quite significant. If you start a widget and pop open the console, you'll see a nonstop torrent of attempted connections, all of which result in NAT traversal failure. I'd estimate that < 1% of attempted connections succeed at NAT traversal.

This is critical-ish path for the MVP, because the widget -- from the perspective of the user -- still doesn't really do anything. It acquires connections so infrequently that it sorta seems broken. If there are any light lifts we can perform to improve the traversal rate, we should do them now. And if there aren't any light lifts, then at least we should know why.

Thus, the topic of this issue:

Let's aggressively instrument our NAT traversal functions. We just need to understand more about what happens when NAT traversal fails. Who does it fail for? Where do they live? Are they on desktop or mobile? Were they able to gather ICE candidates, and if so, what do their ICE candidates look like? Do we think they're behind CGNATs?

Maybe we add NAT behavior discovery (RFC 5780) so that clients can report their NAT type at failure time? It should be relatively trivial with Pion: https://pkg.go.dev/github.com/pion/stun/cmd/stun-nat-behaviour#section-readme

There is the possibility that we may find it necessary to know the NAT types of both parties for each traversal failure, so as to determine whether we're dealing with an unworkable network composition. However, given the very controlled quantity of uncensored peers presently on the network -- I think it's just a few peers we've daemonized on DO, plus 3 or 4 Lantern employees -- it might just be easier for all of us to manually determine our NAT types and factor it into the research here.

noahlevenson commented 1 year ago

Just dumping an update:

We instrumented NAT traversal and added NAT behavior discovery. Traces will start arriving when the new Flashlight builds go out. We'll probably only need a few hours worth of traces to be able to deduce what's going on.

noahlevenson commented 11 months ago

Another update:

I'm still waiting for NAT traces to appear in Honeycomb. I'm assuming that application updates just haven't been pushed yet, or haven't been pushed widely enough. Once the trace data appears, we can start debugging and developing a plan.

In the meantime, as a ham-fisted workaround, we disabled Broflake for all mobile users. The effect is positive -- it seems that desktop users can pierce their NATs at a greater rate, which produces more activity in widget users' clients.

I'd like to keep this issue open until we're able to view those NAT traces and come up with a hypothesis.