getsentry / sentry-native

Sentry SDK for C, C++ and native applications.
MIT License
393 stars 165 forks source link

Use Android system unwinder if API >= 28 #743

Open marandaneto opened 2 years ago

marandaneto commented 2 years ago

Android API 28 ships the system unwinder if API >= 28 https://android.googlesource.com/platform/system/unwinding/+/refs/heads/master/libunwindstack/AndroidVersions.md#android-9-pie_api-level-28

Today we have a stripped-down version of this library to be back compatible with API >= 16 Because of that, we don't benefit from new features or bundle size gains.

We can't get rid of it right now to gain bundle size, again, because of back compatibility. What we can do tho is to actually use the Android system unwinder at runtime if API >= 28, so at least, we could benefit from likely better stack trace and its frames.

Related issue to bundle size: https://github.com/getsentry/sentry-native/issues/700

supervacuus commented 2 years ago

Thanks, @marandaneto. This issue spans four topics:

It is unclear yet if all of them make sense and in which order we'll approach them (except for the upstream changes, which I'll do first). But I tend towards a runtime solution as the one with the broadest value (even if it means keeping the fallback).

marandaneto commented 2 years ago

Option 1 is a good first step. Option 2 wouldn't work because we pre-compile with the min. supported version and ship it as a dependency of the Sentry Android SDK. Option 3 is my suggestion, as long as we can dynamically load the system unwinder (runtime dependency), we are good, otherwise, we fall back to the pre-compiled and stripped-down (pre-compiled and shipped as a dependency) version of libunwindstack. Option 4 I believe this causes a compilation issue < API 21, so not sure if this would be possible because we pre-compile the library.

I'd focus on Options 1 and 3. Option 4 would be nice if possible, so Apps using API >= 21 would benefit from it. Option 2 would have a huge impact on bundle size, even offering it as a separate package would be a problem because I believe the vast majority of Apps are not targeting minAPI 28, actually, the default today is still between 21 and 23. We do know people using lower than 21 tho, so it's harder to drop support for them.

supervacuus commented 2 years ago

Thanks that is valuable feedback. If most clients use pre-compiled packages, we can defer options 2 and 4, which will only make sense if we also have build/deploy use-cases for sentry-native with respective APIs.

I will focus on 1 and 3, then.

supervacuus commented 1 year ago

dear @marandaneto, @Swatinem, and @ashwoods (@krystofwoldrich, this is only for completeness' sake since you helped me understand the issue in RN turbo modules).

Here is a short update on what has been happening, where we hit dead ends, and how we should proceed. Also within the context of https://github.com/getsentry/sentry-react-native/issues/2226 but relevant to essentially all native stack-trace needs on the android platform.

Let's first go back to our two options:

  1. updating libunwindstack-ndk (i.e., our stripped-down version): this is done (https://github.com/getsentry/libunwindstack-ndk/commit/90d80b7073ae2108f7f42e2f576bddf988069ba2, https://github.com/getsentry/libunwindstack-ndk/commit/5a3cf38dbaa9603dc0943fa0fb1ad69d52799af0, https://github.com/getsentry/sentry-native/commit/40c35354b1d778411bbfcef53ec25be53aa29f18). We can release this as a minor version because most changes will have little effect on our use case. This was still a useful task as it was clear from the get-go that we would need this dependency for a long time as sentry-android supports SDK versions < 28. There were a lot of cosmetic and infrastructure changes (affecting all files), and upstream also moved, so having a current sync state will allow us to update more regularly.
  2. dynamically loading libunwindstack when running on Android SDK version >= 28. I think I covered a lot of detail of the approach here: https://github.com/getsentry/sentry-native/pull/752, but as a summary:
    • libunwindstack is unavailable as a user-space library to NDK developers. And it doesn't look like this will change soon since it is meant as an unwinder for platform developers.
    • we cannot directly open and map to the symbols of libunwindstack in the system-library folder because Android prevents this since N.
    • libunwind is part of the NDK but only as a C++ exception unwinder and also not exposed as a stable API to NDK devs.
    • The only officially exposed unwinder on the NDK side is _Unwind_Backtrace, which is useful if you want a stack trace from the current point of execution (i.e., the pc), but does not provide an interface for stack traces relative to an arbitrary address (which you want in case of a crash, where you start the trace from the context you get from the signal-handler). This is also true for the <exec_info.h> backtrace implementation on Android, which is implemented in terms of _Unwind_Backtrace. It does contain a few minor improvements compared to our stripped-down version (for instance, it eliminates specific frames from the trace, which our stripped-down version falsely adds and for which there is no memory mapping available). Thus we should reconsider the dynamic loading for the sentry_unwind_stack() use case where we pass NULL as the address as soon as we have a more sensible source for the crash context (more on that below).

So this doesn't sound great for the outcome of a time-consuming deep dive since most options we have, even in the latest Android releases, at least provide one essential regression to "our" stripped-down libunwindstack. And none of them solve the symbolication issue. During my testing, I also used the adb logcat crash summaries from debuggerd for comparison and remembered that Android also stored tombstones. The amount of (native) crash context a tombstone contains is considerable. It provides:

And the kicker is: the tombstones for native crashes are available to apps without additional permissions starting with Android R. You can essentially do this now:

ActivityManager activityManager =
    (ActivityManager) getApplicationContext().getSystemService(Context.ACTIVITY_SERVICE);

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.R) {
  List<ApplicationExitInfo> exitReasons =
      activityManager.getHistoricalProcessExitReasons(null, 0, 0);
  for (ApplicationExitInfo reason : exitReasons) {
    if (reason.getReason() != ApplicationExitInfo.REASON_CRASH_NATIVE) continue;
    try (InputStream traceInputStream = reason.getTraceInputStream()) {
      // traceInputStream contains a protobuf encoded tombstone in the following format
      // https://android.googlesource.com/platform/system/core/+/refs/heads/master/debuggerd/proto/tombstone.proto
    } catch (IOException e) {
      Sentry.captureException(e);
    }
  }
}

This means on Android R+ we can potentially eliminate the need for sentry-native to produce crash contexts for NDK crashes and on a level of fidelity that is impractical to achieve from the NDK side. The tombstones weigh 100s of KiB but are compressible to a 1/10th (not methodical, just after a couple of experiments). We need to figure out how to map the remaining run meta-data (scope, tracing, etc.) produced by your users when using sentry-native in NDK to the ApplicationExitInfo provided by the ActivityManager (and where this mapping should take place: on the device or in the backend). For this, I suggest we set up a sync next week(?) to gather the issues that will come up if we attempt to change the interface between sentry-android and sentry-native in such a fashion. Please also let me know if I should prepare something for which you already have concrete questions.

How does this help https://github.com/getsentry/sentry-react-native/issues/2226? Tbh, I am not sure how much an RN app developer can extract from this, but they at least get the same data quality (and way more data) they would get via the debuggerd logs in adb logcat but for a device in the field. And they could point out to the RN devs that a Java RuntimeException that escaped into the "newArch" codegen-layer leads to a SIGSEGV in the Android Runtime.

Swatinem commented 1 year ago

browsing the API here, I noticed getProcessStateSummary which returns binary data that was attached via ActivityManager.setProcessStateSummary so I believe that would be a solution to attach the sentry state and read that back from the crash report.

supervacuus commented 1 year ago

Yes, and this wouldn't have to register every state update, but could also be just a reference to some process-state store (or just the run UUID).

marandaneto commented 1 year ago

Yeah, this is something I looked into in the past as well. https://github.com/marandaneto/android-exit-reasons-api-sample But it was just a preview (Android 11), I guess we can start considering it.

Thanks @supervacuus for the detailed reply.