getsentry / sentry-dart

Sentry SDK for Dart and Flutter
https://sentry.io/for/flutter/
MIT License
743 stars 227 forks source link

Support Profiling #1106

Open marandaneto opened 1 year ago

marandaneto commented 1 year ago

Description

Similar to https://docs.sentry.io/platforms/android/profiling/ but for Dart code

Relates issues on the Dart SDK https://github.com/dart-lang/sdk/issues/3686, https://github.com/dart-lang/sdk/issues/37664, https://github.com/dart-lang/sdk/issues/50055, https://github.com/flutter/flutter/issues/37204

marandaneto commented 1 year ago

Right now that would be possible if you init the SDK manually.

Enable Performance and Profiling directly on the Native SDKs, for example, Android. docs.sentry.io/platforms/android/performance docs.sentry.io/platforms/android/profiling

The same steps for iOS, would work for Android and iOS native code only, not in the Dart bits nor C/C++ code.

bruno-garcia commented 1 year ago

Hey @marandaneto , I was talking to @vaind and he might take a look at this to see how hard/what options do we have.

kahest commented 1 year ago

Hey @bruno-garcia @vaind are there any updates on this?

vaind commented 1 year ago

@kahest no updates yet, just started looking into this recently

vaind commented 1 year ago

This article by a Dart SDK developer gives some intro how profiling is implemented in the Dart VM and exposed via DevTools. TLDR:

Therefore, this looks like a dead end.

I'll update here if I can find an alternative solution, e.g. isolate stacks sampling from dart directly.

marandaneto commented 1 year ago

@vaind what if we propose to make this available on release builds under a build opt-in flag, the port is closed in this case. Could we reuse most of the profiler implementation if we get the buy-in from the Dart team in this case? raising an issue and so on.

vaind commented 1 year ago

@marandaneto I've considered that too but I'm not sure it's feasible because of the VM service port being exposed to every app on the device. It doesn't really matter whether it's an opt-in at build time, you wouldn't want to distribute such an app, especially on mobile devices.

marandaneto commented 1 year ago

@vaind that was my point, can we change this approach about the port? finding another way to consume the service without opening the port, or via https://github.com/dart-lang/sdk/issues/37664

krystofwoldrich commented 1 year ago

To send profiles, the SDK will need to update the way it sends/enriches events on Android.

The Outbox sender opens the saved envelope and sends envelope items as individual envelopes, but profiles have to be ingested with Tx in one envelope.

vaind commented 1 year ago

So apparently, native profilers should work with AOT compiled dart. Going to see if I can make it work with our existing native SDK profilers. See this thread on Discord

mraleph Anything that works for native code will work just the same for Dart, so if you have some sampling profiler for C++ / Objective-C / Swift then you can just use that. AOT compiled binaries are just normal native binaries (at least on Linux, Android and Mac OS X / iOS - Windows is an exception) which just need some runtime support to run. our calling conventions are fairly traditional (frames are linked through framepointer) and we generate eh_frame / debug_frame so non-FP based unwinders should also be able to unwind the stack. This means native tools like perf and Instruments work just fine with AOT compiled Dart code and I also know that https://gperftools.github.io/gperftools/cpuprofile.html (which is a very simple profiler which simply unwinds stack using frame-pointer chaining) works as well. (There is one minor catch which trips over simpleperf on Android ARM64 - but I don't think it matters much if you just write a manual unwinder which follows FP chain)

marandaneto commented 1 year ago

@vaind the Android profiler right now only profiled Java/Kotlin code, No native (C/C++) code, maybe the iOS one works though.

vaind commented 1 year ago

Update: sentry-cocoa profiler seems to work, somewhat. In a flutter app on macOS, I've started a transaction in swift, than ran a heavy operation in dart and stopped the swift transaction afterwards. The profile is captured and after symbolication, it shows function names, albeit the line numbers are not available consistently... See sample profile.

On the other hand, the CPU profiler is going to show work that is actually being executed, so in case of async-await, it may get more complicated to see what is actually going on. I'll have to devise a better testing app to evaluate that.

vaind commented 1 year ago

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS. However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

marandaneto commented 1 year ago

@vaind not aware of any changes/bugs. There were changes for Flutter specifically, mostly around source maps IIRC. I recall that as well https://github.com/getsentry/symbolic/blob/11472bfbb31f2ed76802ff50bfc40a2b0852ee1b/symbolic-debuginfo/src/dwarf.rs#L519-L521 but not sure if there's any impact. Do the redacted frames are inApp or maybe some system apps/3rd party libs?

vaind commented 1 year ago

OK, so this would definitely need more attention to get working properly. I'm not sure trying to investigate this deeply makes sense just yet, with other platforms not resolved yet. I'm thinking we should first make sure all other desired platforms can be supported, before the detailing work on iOS. WDYT @marandaneto ?

Also, I understand the goal would be to support all platforms supported by Flutter. However, if we go the route of native profiling, that means the platforms would be evaluated & implemented one by one. Would that be acceptable? If so, what are the priorities for platform support and is there a hard stop if some specific platform cannot be supported?

marandaneto commented 1 year ago

@vaind makes sense, I'd focus on iOS and Android first, most likely starting from iOS since the iOS profiler should work (as you stated with a few gotchas). Next is Android although we'd need a different solution, probably something that should be builtin in https://github.com/getsentry/sentry-native? Maybe @stefanosiano and/or @indragiek can chime in here maybe they know or have investigated C/C++ profilers for Android, instead of the current Java/Kotlin-only approach.

I know this: https://developer.android.com/topic/performance/tracing/custom-events-native

Wondering if the Android native profiler would work for Windows and Linux but that's definitely a stretch since we don't have the sentry-native SDK yet built-in in Sentry Flutter anyway.

vaind commented 1 year ago

Good, my idea was to verify the feasibility of native profiling on Flutter with the android SDK (as you have mentioned, via sentry-native most likely). PoC would be enough IMO and then we can go on and finish iOS first before fully implementing Android.

vaind commented 1 year ago

Some notes on Android profiling:

marandaneto commented 1 year ago

@vaind your best bet to find out which native profilers work well on Android - Native code/NDK (at runtime/low frequency/release mode) will be asking on the Android united slack community, there's a #ndk channel and some Googlers are there, including @DanAlbert which is one of the lead contributors on https://github.com/android/ndk

If we can't use simpleperf directly, they might know some other options.

marandaneto commented 1 year ago

https://android.googlesource.com/platform/system/extras/+/refs/heads/main/simpleperf/doc/android_application_profiling.md simpleperf is only profileable in debug and profile mode apparently, there's a work around but apparently still depends on adb.

If you want to profile a release build of an application: For the release build type, Android studio sets android::debuggable=“false” in AndroidManifest.xml, disables JNI checks and optimizes C/C++ code. However, security restrictions mean that only apps with android::debuggable set to true can be profiled. So simpleperf can only profile a release build under these three circumstances: If you are on a rooted device, you can profile any app.

If you are on Android >= Q, you can add profileableFromShell flag in AndroidManifest.xml, this makes a released app profileable by preinstalled profiling tools. In this case, simpleperf downloaded by adb will invoke simpleperf preinstalled in system image to profile the app.

@vaind did you check the firefox profiler? https://profiler.firefox.com/docs/#/./guide-profiling-android-directly-on-device

Edit: apparently simpleperf as well https://searchfox.org/mozilla-central/source/third_party/libwebrtc/tools_webrtc/android/profiling/perf_setup.sh

DanAlbert commented 1 year ago

I know nothing about Dart.

marandaneto commented 1 year ago

I know nothing about Dart.

Flutter apps written in Dart compiles to Native code so it's not really about Dart profilers but rather Android Profilers that are able to profile Native code and not only Java/Kotlin.

vaind commented 1 year ago

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS. However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

OK so at least in errors, the issue of some stack frames not being symbolicated is due to dSYMs missing for the Flutter.framework (or FlutterMacOS.framework). They're currently not shipped with Flutter at the moment so the dart plugin won't upload them to Sentry and thus they can't be used for symbolication, see https://github.com/flutter/flutter/issues/117404#issuecomment-1360064880

marandaneto commented 1 year ago

@vaind we can probably make a flutter symbol server https://docs.sentry.io/platforms/unreal/data-management/debug-files/symbol-servers/ Another Option is that the dart plugin figure out the correct flutter version/download link and download/upload them.

vaind commented 1 year ago

and the third one, IMO safer for long term maintenance, would be to update the flutter tool to include the dSYM together with the rest of the build output. The same applies to iOS, macOS and likely Android symbols.

FYI, after downloading the dSYM manually and uploading it to sentry.io as a DIF, the issue stack trace now looks much better:

marandaneto commented 1 year ago

@vaind totally agree but the issue is ~2y old already, not sure if this will ever be addressed. We can be more proactive and find a solution that won't demand too much work. Symbol servers work with GCP so maybe it's an easy win.

What we can do for now is also amend the docs and let people know that they can do this manually (via sentry-cli), so at least is documented as a limitation of our automatic approach (and linking to the original GH issue).

marandaneto commented 1 year ago

I've filed a feature request for a new built-in Flutter symbol server, let's see if this is possible, and is less work/less to maintain than the other options.

vaind commented 1 year ago

The additional issue with symbolication on iOS I'm having trouble with is:

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present. I just can't seem to figure out what is the issue and why other frames in the stack trace do have the line number, even the caller Dart function which is the second in the stack... Maybe @Swatinem could help out here?

I've uploaded the whole build folder with the debug symbols, the envelope with the captured profile and the profile as downloaded (symbolicated) from Sentry.

marandaneto commented 1 year ago

@vaind I will be OOO until the 7th but feel free to ping @Swatinem on Discord @kahest or @krystofwoldrich can be the bridge as well if needed.

mraleph commented 1 year ago

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present.

It might be that the information is simply missing from the DWARF we generate. We emit just enough information to make meaningful stack traces, which means we don't emit any useful DWARF for the places which are not calls. So if you write something like this:

void foo() {
  for (var i = 0; i < N; i++) {
     // Do some math without any calls.
  }
}

Then the best you can be get is that the time is spent in foo function - but you would not be able to tell where exactly in that function the time is spent.

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

vaind commented 1 year ago

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present.

It might be that the information is simply missing from the DWARF we generate. We emit just enough information to make meaningful stack traces, which means we don't emit any useful DWARF for the places which are not calls. So if you write something like this:

void foo() {
  for (var i = 0; i < N; i++) {
     // Do some math without any calls.
  }
}

Then the best you can be get is that the time is spent in foo function - but you would not be able to tell where exactly in that function the time is spent.

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

Thanks Slava, I also suspect as much, just wasn't able to confirm with my limited knowledge of DWARF. I was hoping @Swatinem could have a look at some point. It's not a blocker really as it seems to be "just" the leaf function code.

Swatinem commented 1 year ago

I’m stretched really thin these days, please ping me again next week :-)

vaind commented 12 months ago

I’m stretched really thin these days, please ping me again next week :-)

@swatinem any chance you could have a look? The latest profile has even less info, probably inlined?

Swatinem commented 12 months ago

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

Was looking at the DWARF, it is indeed reporting line 0, column 0 for the whole block of code that is hit by the profiler.

Looking at the very latest profile / debug file you posted, indeed the DWARF only reports that as a single toplevel function, but it has a ton of line table entries from multiple files as well.

So maybe the DWARF info for inlined functions is not being generated correctly.