getsentry / sentry-java

A Sentry SDK for Java, Android and other JVM languages.
https://docs.sentry.io/
MIT License
1.14k stars 432 forks source link

SIGSEGV when user interaction instrumentation is enabled #3653

Open OlivierGenez opened 3 weeks ago

OlivierGenez commented 3 weeks ago

Integration

sentry-android

Build System

Gradle

AGP Version

8.3.2

Proguard

Disabled

Version

7.12.1

Steps to Reproduce

My team has observed an increase in this type of crashes in Sentry/Android vitals with the latest update of our app:

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 28018 >>> <Application ID redacted> <<<

backtrace:
  #00  pc 0x0000000000058290  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000063e618  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallbackWithUffdGc(void*)+8)
  #14  pc 0x000000000006efbc  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204)
  #15  pc 0x0000000000060d60  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

This is not a new issue (we've seen reports as far back as a year ago) but there has been a significant increase in crash reports.

Our app's sentry config has user interaction instrumentation enabled:

SentryAndroid.init(context) { options ->
    // [...]
    options.tracesSampleRate = 1.0
    options.profilesSampleRate = 1.0
    // [...]
    options.isEnableUserInteractionTracing = true
    // [...]
}

After some investigation, we've been able to replicate the issue in the debug version of our app (i.e., R8 is disabled) on Pixel 6a and Pixel 7a devices with Android 14 by:

  1. opening the app
  2. tap on any of our bottom navigation bar navigation item in very rapid succession until the app crashes

Based on Sentry/Android vitals crash reports this definitely occurs on a wide variety of devices with standard app usage, but this is one way we've been able to replicate the issue somewhat consistently.

Expected Result

The application proceeds as normal and doesn't crash.

Actual Result

After a while, the interactions slow down a bit, then the application crashes:

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 28018 >>> <Application ID redacted> <<<

backtrace:
  #00  pc 0x0000000000058290  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000063e618  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallbackWithUffdGc(void*)+8)
  #14  pc 0x000000000006efbc  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204)
  #15  pc 0x0000000000060d60  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Attached is a full crash dump: tombstone.txt.

The issue cannot be replicated when user interaction instrumentation is disabled:

SentryAndroid.init(context) { options ->
    // [...]
    options.isEnableUserInteractionTracing = false
    // [...]
}
ash-wtag commented 2 weeks ago

I am facing the same issue, any update on this?

Sentry version:

io.sentry.android.gradle:4.5.1
io.sentry:sentry-android: 6.19.0
SentryAndroid.init(app) { options: SentryAndroidOptions ->
   options.dsn = token
    options.environment = buildType
    options.release = releaseName
}

here's the stack trace

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 17593 >>> ch.pickebike <<<

backtrace:
  #00  pc 0x0000000000097390  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000010ba80  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #14  pc 0x000000000009f690  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)
markushi commented 2 weeks ago

Hey everyone, thanks for reaching out!

This looks like another issue with Androids built-in profiler. Similar to https://github.com/getsentry/sentry-java/issues/2604 and https://github.com/getsentry/sentry-java/issues/3561

Disabling user interaction instrumentation just hides the real culprit, as user interaction instrumentation creates transactions which in turn creates profiles, which itself uses the built-in Android profiler.

Could you try to disable profiling instead?

SentryAndroid.init(context) { options ->
    options.profilesSampleRate = 0.0
}

On top of that: Is your app using any native (C/C++) code in combination with some custom threading?

OlivierGenez commented 2 weeks ago

Could you try to disable profiling instead?

We actually had tried this when debugging the issue and found that it seemed to prevent crashes from happening as well. Would you advise disabling profiling instead of user interaction instrumentation?

On top of that: Is your app using any native (C/C++) code in combination with some custom threading?

Our app doesn't use native code "directly", but some libraries we depend on do. The code is not open source though and is not shared with us, so I can't tell exactly how it deals with threading.

kahest commented 2 weeks ago

For reference:

markushi commented 2 weeks ago

Could you try to disable profiling instead?

[...]Would you advise disabling profiling instead of user interaction instrumentation?

@OlivierGenez Yes, we would advise disabling profiling in the meantime instead.

markushi commented 2 weeks ago

Let's try to reproduce this issue in a minimal environment (Android 14, as seen in the attached tombstone).

kahest commented 1 week ago

Update from Google on the issue tracker:

We have shared this with our product and engineering team and will update this issue with more information as it becomes available.

markushi commented 1 week ago

@OlivierGenez

My team has observed an increase in this type of crashes in Sentry/Android vitals with the latest update of our app

Is there any configuration change you did in the "latest update" of your app? E.g. did you change the sampling rate, enable a specific feature, bumped an SDK version tc?

empowerDan commented 3 days ago

Hi @markushi , just a heads up that this will occur even with options.profilesSampleRate = 0.0. It's also happening on Android 12, 13 and 14.

ashwin-coles commented 3 days ago

I can confirm with the latest update to disable profiling, we are still observing crashes. As @OlivierGenez mentioned, turning off profiling and disabling isEnableUserInteractionTracing reduced events of crashes resulting from aggressive monkey-taps, but lifecycle events seem to be the last listed event in some of the breadcrumbs in crashes. We have now disabled all tracing and are waiting to see if that helps at all.

options.isEnableActivityLifecycleTracingAutoFinish = false
options.isEnableAutoActivityLifecycleTracing = false
options.isEnableTimeToFullDisplayTracing = false
options.isEnableUserInteractionTracing = false
romtsn commented 2 days ago

@empowerDan @ashwin-coles could you share the backtrace of these crashes (after disabling profiling)? Is it the same as the other ones in this thread?

empowerDan commented 2 days ago

Yep, same - also can confirm that @ashwin-coles snippet brings all art::Thread::DumpState errors down to 0, however this silences quite a lot of other things too so it's not a very viable long term solution as a paying customer.

options.isEnableActivityLifecycleTracingAutoFinish = false options.isEnableAutoActivityLifecycleTracing = false options.isEnableTimeToFullDisplayTracing = false options.isEnableUserInteractionTracing = false

Do we know if the issue occurs on previous versions of Sentry too?