Closed noahlevenson closed 10 months ago
The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?
I checked out this branch and it built with go version go1.21.1 linux/amd64
just fine, as well. Then I pointed lantern-cloud at this branch (with a replace
statement in l-c's go.mod
). Everything built and all tests passed.
(I realize this doesn't exercise the actual functionality too much, but I wanted to make both of those things known.)
Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.
Looking good!
The 5 second dial timeout does seem a touch low to me tbh. For me, for example, SSHing to some far away machine can sometimes take in that range. I think Linux defaults TCP connection timeouts to 60 seconds! I think 8 or thereabouts is probably fine?
Cool -- bumped it to 8 here: https://github.com/getlantern/broflake/pull/207/commits/36fc51b6f05c9a9f55f62c0dcd51595089f152f3
Nice! The steps in the README work great, although the sequence for running Freddie is slightly different now.
Looking good!
Fixed here: https://github.com/getlantern/broflake/pull/207/commits/fb8392c78271714aeb4b958da79ac2af3442dd08
Just FYI, I've been giving it an aggressive poke this morning, and I was able to stimulate a panic -- so I'm gonna suss this out before merging.
AFAICT, the repro is: start a desktop client, start a widget, let them signal and establish a QUIC connection. Then disconnect the widget and let the desktop client run through its reset behaviors.
But I haven't been able to stimulate the panic every time that way...
2023/11/16 11:38:36 QUIC connection error (timeout: no recent network activity), closing!
panic: connection already exists
goroutine 2848 [running]:
github.com/quic-go/quic-go.(*connMultiplexer).AddConn(0xc00066a4a0, {0x7fcbe6deed70, 0xc00010ef00})
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/multiplexer.go:59 +0x21a
github.com/quic-go/quic-go.(*Transport).init.func1()
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:241 +0x766
sync.(*Once).doSlow(0xc0002fe248, 0xc0008f7d28)
/home/noah/sdk/go1.20.11/src/sync/once.go:74 +0x102
sync.(*Once).Do(0xc0002fe248, 0xc0008f7f80?)
/home/noah/sdk/go1.20.11/src/sync/once.go:65 +0x47
github.com/quic-go/quic-go.(*Transport).init(0xc0002fe1e0, 0x1)
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:200 +0x68
github.com/quic-go/quic-go.(*Transport).dial(0xc0002fe1e0, {0x1116f68, 0xc00089cde0}, {0x1114fd8, 0x110ea20}, {0x0, 0x0}, 0x19?, 0x0?, 0x0)
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:186 +0xdb
github.com/quic-go/quic-go.(*Transport).Dial(...)
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/transport.go:173
github.com/quic-go/quic-go.Dial({0x1116f68, 0xc00089cde0}, {0x1119d90?, 0xc00010ef00}, {0x1114fd8, 0x110ea20}, 0xc000220180, 0xc000bce600?)
/home/noah/go/pkg/mod/github.com/quic-go/quic-go@v0.40.0/client.go:117 +0x20d
github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection.func1()
/home/noah/work/broflake/clientcore/quic.go:89 +0x119
created by github.com/getlantern/broflake/clientcore.(*QUICLayer).DialAndMaintainQUICConnection
/home/noah/work/broflake/clientcore/quic.go:87 +0x3aa
The panic seems to be a canary they inserted due to breaking API changes:
https://github.com/quic-go/quic-go/blob/427f53328bb18ed1f84f352914cec805d0051494/multiplexer.go#L55
TBH, I'm kinda confused about what's going on. I think they're using the word "connection" in different ways in different places, hence their TODO to write a more descriptive panic message.
In particular, I'm not sure what "if we're already listening on this connection" means. Skimming the breaking changes notes, it seems this all has to do with reuse of the underlying net.PacketConn
, so I'm assuming they're talking about listening on the net.PacketConn
, which sort of makes sense.
I'll have to read all those breaking changes notes and see if I can figure out what's going on.
Alright, I think I found the explanation:
Don’t use the same net.PacketConn for multiple calls to Dial and / or Listen. Instead, create a DialListener. Since using the same connection was the previously recommended API, and will lead to pretty subtle bugs if not migrated properly, quic-go will panic if the same net.PacketConn is used for calls to Dial and / or Listen, for some transition time (we’ll remove this behavior once most users have completed the migration).
Let's see what I can do...
For those following along at home, DialListener
seems to have eventually been renamed Transport
...
OK, this should do it: https://github.com/getlantern/broflake/pull/207/commits/15c26ab2dd539923d9580918591483d414e055c1
Gonna let this marinate until tomorrow morning, then I'll merge. Just want to spend a little more time poking it.
Good lord.
This resolves the issues related to new versions of quic-go and bumps us to the newest version, which is 0.40. I'm now able to build everything using Go 1.20.11.
Basically, new versions of quic-go have introduced two nasty lil changes:
quic.Dial
-- instead of timing out and returning an error if the handshake fails, it now obeys a context. Previously, we relied on the timeout and error behavior.net.PacketConn
you build your QUIC connection atop of. Previously, thatnet.PacketConn
wasn't required to implement read or write deadlines. Now it requires correctly implemented read deadlines. So I had to make ourBroflakeConn
do all theSetReadDeadline
stuff and obey the deadline properly. Fun times.This must be merged alongside this change to Flashlight: https://github.com/getlantern/flashlight/pull/1332