getlantern / browsersunbounded

Interoperable browser-based P2P proxies for censorship circumvention
GNU General Public License v3.0
14 stars 0 forks source link

Bump to quic-go 0.40 #207

Closed noahlevenson closed 10 months ago

noahlevenson commented 10 months ago

This resolves the issues related to new versions of quic-go and bumps us to the newest version, which is 0.40. I'm now able to build everything using Go 1.20.11.

Basically, new versions of quic-go have introduced two nasty lil changes:

  1. They changed the signature and behavior of quic.Dial -- instead of timing out and returning an error if the handshake fails, it now obeys a context. Previously, we relied on the timeout and error behavior.
  2. They're now much more persnickety about the net.PacketConn you build your QUIC connection atop of. Previously, that net.PacketConn wasn't required to implement read or write deadlines. Now it requires correctly implemented read deadlines. So I had to make our BroflakeConn do all the SetReadDeadline stuff and obey the deadline properly. Fun times.

This must be merged alongside this change to Flashlight: https://github.com/getlantern/flashlight/pull/1332

myleshorton commented 10 months ago

The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?

Crosse commented 10 months ago

I checked out this branch and it built with go version go1.21.1 linux/amd64 just fine, as well. Then I pointed lantern-cloud at this branch (with a replace statement in l-c's go.mod). Everything built and all tests passed.

(I realize this doesn't exercise the actual functionality too much, but I wanted to make both of those things known.)

myleshorton commented 10 months ago

Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.

Looking good!

noahlevenson commented 10 months ago

The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?

Cool -- bumped it to 8 here: https://github.com/getlantern/broflake/pull/207/commits/36fc51b6f05c9a9f55f62c0dcd51595089f152f3

noahlevenson commented 10 months ago

Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.

Looking good!

Fixed here: https://github.com/getlantern/broflake/pull/207/commits/fb8392c78271714aeb4b958da79ac2af3442dd08

noahlevenson commented 10 months ago

Just FYI, I've been giving it an aggressive poke this morning, and I was able to stimulate a panic -- so I'm gonna suss this out before merging.

AFAICT, the repro is: start a desktop client, start a widget, let them signal and establish a QUIC connection. Then disconnect the widget and let the desktop client run through its reset behaviors.

But I haven't been able to stimulate the panic every time that way...

2023/11/16 11:38:36 QUIC connection error (timeout: no recent network activity), closing!
panic: connection already exists

goroutine 2848 [running]:
github.com/quic-go/quic-go.(*connMultiplexer).AddConn(0xc00066a4a0, {0x7fcbe6deed70, 0xc00010ef00})
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/multiplexer.go:59 +0x21a
github.com/quic-go/quic-go.(*Transport).init.func1()
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:241 +0x766
sync.(*Once).doSlow(0xc0002fe248, 0xc0008f7d28)
        /home/noah/sdk/go1.20.11/src/sync/once.go:74 +0x102
sync.(*Once).Do(0xc0002fe248, 0xc0008f7f80?)
        /home/noah/sdk/go1.20.11/src/sync/once.go:65 +0x47
github.com/quic-go/quic-go.(*Transport).init(0xc0002fe1e0, 0x1)
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:200 +0x68
github.com/quic-go/quic-go.(*Transport).dial(0xc0002fe1e0, {0x1116f68, 0xc00089cde0}, {0x1114fd8, 0x110ea20}, {0x0, 0x0}, 0x19?, 0x0?, 0x0)
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:186 +0xdb
github.com/quic-go/quic-go.(*Transport).Dial(...)
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:173
github.com/quic-go/quic-go.Dial({0x1116f68, 0xc00089cde0}, {0x1119d90?, 0xc00010ef00}, {0x1114fd8, 0x110ea20}, 0xc000220180, 0xc000bce600?)
        /home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/client.go:117 +0x20d
github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection.func1()
        /home/noah/work/broflake/clientcore/quic.go:89 +0x119
created by github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection
        /home/noah/work/broflake/clientcore/quic.go:87 +0x3aa
noahlevenson commented 10 months ago

The panic seems to be a canary they inserted due to breaking API changes:

https://github.com/quic-go/quic-go/blob/427f53328bb18ed1f84f352914cec805d0051494/multiplexer.go#L55

TBH, I'm kinda confused about what's going on. I think they're using the word "connection" in different ways in different places, hence their TODO to write a more descriptive panic message.

In particular, I'm not sure what "if we're already listening on this connection" means. Skimming the breaking changes notes, it seems this all has to do with reuse of the underlying net.PacketConn, so I'm assuming they're talking about listening on the net.PacketConn, which sort of makes sense.

I'll have to read all those breaking changes notes and see if I can figure out what's going on.

noahlevenson commented 10 months ago

Alright, I think I found the explanation:

Don’t use the same net.PacketConn for multiple calls to Dial and / or Listen. Instead, create a DialListener. Since using the same connection was the previously recommended API, and will lead to pretty subtle bugs if not migrated properly, quic-go will panic if the same net.PacketConn is used for calls to Dial and / or Listen, for some transition time (we’ll remove this behavior once most users have completed the migration).

Let's see what I can do...

noahlevenson commented 10 months ago

For those following along at home, DialListener seems to have eventually been renamed Transport...

noahlevenson commented 10 months ago

OK, this should do it: https://github.com/getlantern/broflake/pull/207/commits/15c26ab2dd539923d9580918591483d414e055c1

Gonna let this marinate until tomorrow morning, then I'll merge. Just want to spend a little more time poking it.

Crosse commented 10 months ago

Good lord.