getlantern / lantern-client

Lantern Client code
GNU General Public License v3.0
14 stars 3 forks source link

Another attempt to resolve macOS crash #1132

Closed atavism closed 3 months ago

atavism commented 4 months ago

For https://github.com/getlantern/engineering/issues/1504

Replaces https://github.com/getlantern/lantern-client/pull/1130

Improved error checking and a couple bug fixes:

How to reproduce the issue

signal 16 received but handler not on signal stack
mp.gsignal stack [0x14000084000 0x1400008c000], mp.g0 stack [0x16d62c000 0x16d82f000], sp=0x14003587588
fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack:
runtime.throw({0x16f66699d?, 0x0?})
    runtime/panic.go:1023 +0x40 fp=0x140035874e0 sp=0x140035874b0 pc=0x16e94be10
runtime.sigNotOnStack(0x10, 0x14003587588, 0x14000080008)
    runtime/signal_unix.go:1065 +0x118 fp=0x14003587510 sp=0x140035874e0 pc=0x16e966878
runtime.adjustSignalStack(0x10, 0x14000080008, 0x140035875b8)
    runtime/signal_unix.go:592 +0x25c fp=0x14003587580 sp=0x14003587510 pc=0x16e9656dc
runtime.sigtrampgo(0x10, 0x14003587720, 0x14003587788)
    runtime/signal_unix.go:480 +0x8c fp=0x14003587600 sp=0x14003587580 pc=0x16e96521c
runtime.sigtrampgo(0x10, 0x14003587720, 0x14003587788)
    <autogenerated>:1 +0x1c fp=0x14003587630 sp=0x14003587600 pc=0x16e98b54c
runtime.sigtramp()
    runtime/sys_darwin_arm64.s:227 +0x4c fp=0x140035876f0 sp=0x14003587630 pc=0x16e98a01c

full crash logs desktop_crash.txt

atavism commented 4 months ago

@jigar-f It looks like auto-updates may be causing the desktop app to crash consistently. If I remove the call to autoupdate.Configure...

// Configure sets the CA certificate to pin for the TLS auto-update connection.
func Configure(updateURL, updateCA string, iconURL func() string) {
    setUpdateURL(updateURL)
    fnIconURL = iconURL
    httpClient.Store(
        &http.Client{
            Transport: proxied.ChainedThenFrontedWith(updateCA),
        })
    enableAutoupdate()
}

The app no longer crashes. The stack trace included the following error:

github.com/getlantern/go-update/check.(*Params).CheckForUpdate(0x140017f1560, {0x14001bab3e0, 0x2c}, 0x14001e6a150)
    github.com/getlantern/go-update@v0.0.0-20230221120840-8d795213a8bc/check/check.go:178 +0x47c fp=0x14002e71c90 sp=0x14002e71a90 pc=0x12507c48c
github.com/getlantern/autoupdate.(*Config).check(0x14002e71f08)
    github.com/getlantern/autoupdate@v0.0.0-20211217175350-d0b211f39ba7/autoupdate.go:128 +0x230 fp=0x14002e71d00 sp=0x14002e71c90 pc=0x12507df40
github.com/getlantern/autoupdate.(*Config).loop(0x14002e71f08)
    github.com/getlantern/autoupdate@v0.0.0-20211217175350-d0b211f39ba7/autoupdate.go:71 +0x38 fp=0x14002e71df0 sp=0x14002e71d00 pc=0x12507d7f8
github.com/getlantern/autoupdate.ApplyNext(0x1400140ff08)
    github.com/getlantern/autoupdate@v0.0.0-20211217175350-d0b211f39ba7/autoupdate.go:66 +0x19c fp=0x14002e71ea0 sp=0x14002e71df0 pc=0x12507d79c
github.com/getlantern/lantern-client/desktop/autoupdate.watchForUpdate()
    github.com/getlantern/lantern-client/desktop/autoupdate/autoupdate.go:78 +0x194 fp=0x14002e71fd0 sp=0x14002e71ea0 pc=0x12507f594
runtime.goexit({})
    runtime/asm_arm64.s:1222 +0x4 fp=0x14002e71fd0 sp=0x14002e71fd0 pc=0x12457c5c4
created by github.com/getlantern/lantern-client/desktop/autoupdate.enableAutoupdate.func1 in goroutine 149
    github.com/getlantern/lantern-client/desktop/autoupdate/autoupdate.go:62 +0x24
atavism commented 4 months ago

Hmmm, we are no longer calling i18nInit. When we send notifications, we expect those translations to be available. That's probably causing problems as well..

atavism commented 4 months ago

Ok definitely curious if the app is crashing for you with these changes @jigar-f. Could you please take another look?

jigar-f commented 4 months ago

Thanks @atavism making all those changes, I am testing it now..

jigar-f commented 4 months ago

So I just ran those changes after generating bindings, and I got the same error, I was getting. We really need to investigate other codes and then go. But thanks for fixing the other couple of bugs. Appreciate it.

signal 16 received but handler not on signal stack
mp.gsignal stack [0x14000084000 0x1400008c000], mp.g0 stack [0x16d62c000 0x16d82f000], sp=0x14003587588
fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack:
runtime.throw({0x16f66699d?, 0x0?})
    runtime/panic.go:1023 +0x40 fp=0x140035874e0 sp=0x140035874b0 pc=0x16e94be10
runtime.sigNotOnStack(0x10, 0x14003587588, 0x14000080008)
    runtime/signal_unix.go:1065 +0x118 fp=0x14003587510 sp=0x140035874e0 pc=0x16e966878
runtime.adjustSignalStack(0x10, 0x14000080008, 0x140035875b8)
    runtime/signal_unix.go:592 +0x25c fp=0x14003587580 sp=0x14003587510 pc=0x16e9656dc
runtime.sigtrampgo(0x10, 0x14003587720, 0x14003587788)
    runtime/signal_unix.go:480 +0x8c fp=0x14003587600 sp=0x14003587580 pc=0x16e96521c
runtime.sigtrampgo(0x10, 0x14003587720, 0x14003587788)
    <autogenerated>:1 +0x1c fp=0x14003587630 sp=0x14003587600 pc=0x16e98b54c
runtime.sigtramp()
    runtime/sys_darwin_arm64.s:227 +0x4c fp=0x140035876f0 sp=0x14003587630 pc=0x16e98a01c
jigar-f commented 4 months ago

@atavism Just curious did you get this crash too? (have you tried running app 2-3 times) or is it just mine getting this crash again and again, If so then there might be something with my mac?

atavism commented 4 months ago

So I just ran those changes after generating bindings, and I got the same error, I was getting. We really need to investigate other codes and then go. But thanks for fixing the other couple of bugs. Appreciate it.

Thanks for testing. Yeah, sorry for the back and forth: the app isn't consistently crashing for me so I'm just trying to narrow down the problem area

@atavism Just curious did you get this crash too? (have you tried running app 2-3 times) or is it just mine getting this crash again and again, If so then there might be something with my mac?

At least with these updates, the only occasion I can get the app to crash is if I delete the application home directory and try running Lantern again

atavism commented 4 months ago

@jigar-f Could you send your full logs when you get a chance? The idletiming and cmux related stuff from the goroutine stack dump stands out to me:

goroutine 436 gp=0x140000021c0 m=nil [select]:
runtime.gopark(0x14000885d08?, 0x6?, 0x98?, 0x5b?, 0x14000885c9c?)
    runtime/proc.go:402 +0xc8 fp=0x14000885b40 sp=0x14000885b20 pc=0x12c8c1af8
runtime.selectgo(0x14000885d08, 0x14000885c90, 0x12e61a180?, 0x0, 0x14000885ca8?, 0x1)
    runtime/select.go:327 +0x614 fp=0x14000885c50 sp=0x14000885b40 pc=0x12c8d5074
github.com/xtaci/smux.(*Stream).waitRead(0x14006867d40)
    github.com/xtaci/smux@v1.5.24/stream.go:271 +0x19c fp=0x14000885d90 sp=0x14000885c50 pc=0x12d04484c
github.com/xtaci/smux.(*Stream).Read(0x14006867d40, {0x14002ce4000, 0x1000, 0x1000})
    github.com/xtaci/smux@v1.5.24/stream.go:73 +0x80 fp=0x14000885de0 sp=0x14000885d90 pc=0x12d0433e0
github.com/getlantern/cmux/v2.(*cmconn).Read(0x14000475600, {0x14002ce4000?, 0x14000885e68?, 0x12e61a180?})
    github.com/getlantern/cmux/v2@v2.0.0-20230301223233-dac79088a4c0/cmux.go:47 +0x34 fp=0x14000885e20 sp=0x14000885de0 pc=0x12d04ff04
github.com/getlantern/bufconn.(*conn).Read(0x152b4bf7fe48?, {0x14002ce4000?, 0x14000885e98?, 0x12e61a180?})
    github.com/getlantern/bufconn@v0.0.0-20210901195825-fd7c0267b493/bufconn.go:43 +0x84 fp=0x14000885e50 sp=0x14000885e20 pc=0x12d024314
github.com/getlantern/flashlight/v7/bandit.(*dataTrackingConn).Read(0x14002ed2fc0, {0x14002ce4000?, 0x14000885ea8?, 0x12e61a180?})
    github.com/getlantern/flashlight/v7@v7.6.90/bandit/bandit.go:245 +0x34 fp=0x14000885e80 sp=0x14000885e50 pc=0x12cdb4024
github.com/getlantern/flashlight/v7/client.(*proxiedConn).Read(0xc1a07efe87fc24c8?, {0x14002ce4000?, 0x12e61a180?, 0x12e61a180?})
    <autogenerated>:1 +0x34 fp=0x14000885eb0 sp=0x14000885e80 pc=0x12d3d7574
github.com/getlantern/netx.doCopy({0x12db54b48, 0x1400677a210}, {0x12db54ca8, 0x14002c6e360}, {0x14002ce4000, 0x1000, 0x1000}, 0x12c8fbde8?, 0x140019c0c54, 0x12db37c08, ...)
    github.com/getlantern/netx@v0.0.0-20240124040039-163b1628a66b/copy.go:87 +0x158 fp=0x14000885f60 sp=0x14000885eb0 pc=0x12cd203e8
github.com/getlantern/netx.BidiCopyWithOpts.gowrap2()
    github.com/getlantern/netx@v0.0.0-20240124040039-163b1628a66b/copy.go:61 +0x50 fp=0x14000885fd0 sp=0x14000885f60 pc=0x12cd201c0
runtime.goexit({})
    runtime/asm_arm64.s:1222 +0x4 fp=0x14000885fd0 sp=0x14000885fd0 pc=0x12c8fbde4
created by github.com/getlantern/netx.BidiCopyWithOpts in goroutine 258
    github.com/getlantern/netx@v0.0.0-20240124040039-163b1628a66b/copy.go:61 +0x380
...
github.com/refraction-networking/utls.(*Conn).Read(0x14000b53188, {0x140031f6000, 0x1000, 0x12e61a180?})
    github.com/refraction-networking/utls@v1.3.3/conn.go:1328 +0x168 fp=0x14003b3fb60 sp=0x14003b3faf0 pc=0x12cd5c968
github.com/getlantern/idletiming.(*IdleTimingConn).Read(0x14000301600, {0x140031f6000, 0x1000, 0x1000})
    github.com/getlantern/idletiming@v0.0.0-20231030193830-6767b09f86db/idletiming_conn.go:147 +0x2ec fp=0x14003b3fc60 sp=0x14003b3fb60 pc=0x12cda3abc
net/http.(*persistConn).Read(0x140031e66c0, {0x140031f6000?, 0x14003226d18?, 0x12cb87c90?})
    net/http/transport.go:1977 +0x50 fp=0x14003b3fcc0 sp=0x14003b3fc60 pc=0x12cb86db0
bufio.(*Reader).fill(0x140031bcba0)
    bufio/bufio.go:110 +0xf8 fp=0x14003b3fd00 sp=0x14003b3fcc0 pc=0x12caf1998
bufio.(*Reader).Peek(0x140031bcba0, 0x1)
    bufio/bufio.go:148 +0x60 fp=0x14003b3fd20 sp=0x14003b3fd00 pc=0x12caf1b00
net/http.(*persistConn).readLoop(0x140031e66c0)
    net/http/transport.go:2141 +0x158 fp=0x14003b3ffb0 sp=0x14003b3fd20 pc=0x12cb87d38
net/http.(*Transport).dialConn.gowrap2()
    net/http/transport.go:1799 +0x28 fp=0x14003b3ffd0 sp=0x14003b3ffb0 pc=0x12cb86348
runtime.goexit({})
    runtime/asm_arm64.s:1222 +0x4 fp=0x14003b3ffd0 sp=0x14003b3ffd0 pc=0x12c8fbde4
created by net/http.(*Transport).dialConn in goroutine 320
    net/http/transport.go:1799 +0x1018

goroutine 301 gp=0x1400716f180 m=nil [select]:
runtime.gopark(0x1400066ff38?, 0x2?, 0xc8?, 0xfd?, 0x1400066feb4?)
    runtime/proc.go:402 +0xc8 fp=0x1400066fd60 sp=0x1400066fd40 pc=0x12c8c1af8
runtime.selectgo(0x1400066ff38, 0x1400066feb0, 0x115?, 0x0, 0x0?, 0x1)
    runtime/select.go:327 +0x614 fp=0x1400066fe70 sp=0x1400066fd60 pc=0x12c8d5074
github.com/xtaci/smux.(*Session).sendLoop(0x14002a14000)
    github.com/xtaci/smux@v1.5.24/session.go:483 +0x120 fp=0x1400066ffb0 sp=0x1400066fe70 pc=0x12d042700
github.com/xtaci/smux.newSession.gowrap3()
    github.com/xtaci/smux@v1.5.24/session.go:114 +0x28 fp=0x1400066ffd0 sp=0x1400066ffb0 pc=0x12d03fc38
runtime.goexit({})
    runtime/asm_arm64.s:1222 +0x4 fp=0x1400066ffd0 sp=0x1400066ffd0 pc=0x12c8fbde4
created by github.com/xtaci/smux.newSession in goroutine 173
    github.com/xtaci/smux@v1.5.24/session.go:114 +0x364

Btw, if you run the app with GOTRACEBACK=all, that will dump the stack traces of all user-created goroutines

atavism commented 4 months ago

Have you tried running the app with an earlier version of flashlight?

To experiment, I just pushed this branch that updates the current code to use flashlight v7.6.78

Testing those changes, the app no longer crashes when I delete the application home directory and try rerunning the app. Let me keep digging into this..

atavism commented 4 months ago

@jigar-f Could you send your full logs when you get a chance?

Another thing you could try is importing the net/http/pprof package and registering an HTTP handler for the /debug/pprof endpoint like this before you start Lantern..

import _ "net/http/pprof"
import _ "net/http"

//export start
func start() {
  // ...
  go func() {
    fmt.Println(http.ListenAndServe("localhost:6060", nil))
  }()
}

Then you can go to http://localhost:6060/debug/pprof/goroutine?debug=2 to get a full goroutine stack dump.

Here's mine

jigar-f commented 4 months ago

Thanks for testing. Yeah, sorry for the back and forth: the app isn't consistently crashing for me so I'm just trying to narrow down the problem area.

No worries at all, I like this kind of stuff, and I am learning a lot from all this, Happy to help.

jigar-f commented 4 months ago

@atavism Did you revet the flashlight version back to 7.6.90? Also Here is my full stack trace, Also I looked at your stack trace, and you don't have the same error as mine.

desktop_crash.txt

jigar-f commented 4 months ago

For me, I tested this branch, And it crashing with the same logs as above.

atavism commented 4 months ago

@atavism Did you revet the flashlight version back to 7.6.90? Also Here is my full stack trace, Also I looked at your stack trace, and you don't have the same error as mine.

Thanks for sending that over, @jigar-f! Yeah, I did revert flashlight to 7.6.90

atavism commented 4 months ago

If you recover from the panic in the start method like this, do you get any useful information? debug.Stack() should return the stack of the panic:

defer func() {
  if r := recover(); r != nil {
    fmt.Println("stacktrace from panic: \n" + string(debug.Stack()))
  }
}()