DataDog / dd-sdk-flutter

Flutter bindings and tools for utilizing Datadog Mobile SDKs
Apache License 2.0
43 stars 42 forks source link

No implementation found for method addError on channel datadog_sdk_flutter.rum #596

Closed androidmitry closed 1 month ago

androidmitry commented 5 months ago

Stack trace

Fatal Exception: io.flutter.plugins.firebase.crashlytics.FlutterError: MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum) at MethodChannel._invokeMethod(platform_channel.dart:332) at ._willHandleError(helpers.dart:14)

Reproduction steps

Add the datadog_flutter_plugin package, release to App Store

Volume

0,0021 (1-2 users per day)

Affected SDK versions

2.4.0

Does the crash manifest in the latest SDK version?

Yes

Flutter Version

3.19.5

Setup Type

Flutter Application

Device Information

OS - Android per version: Android 12 - 88% Android 10 - 7% Android 14 - 3% Android 13 - 2%

per device: Samsung - 93% Oneplus - 7%

Other relevant information

Device states: background 60%

fuzzybinary commented 5 months ago

Hi @androidmitry ,

Thanks for the report, I'll look into this as soon as I can.

Can you give me anymore information about a possible reproduction? Have you been able to reproduce locally at all? Is there anything strange about your setup that might be disconnecting the MethodChannel from our plugin? We tend to wrap every call we make to try to avoid crashes, so I'm very concerned that this is causing a crash....

androidmitry commented 5 months ago

Hi @fuzzybinary , unfortunately thats all information I have so far. I wasn't able to reproduce it. We had some custom platform code, but it was removed. Whats interesting is that number of reports is decreasing. I will update the issue if crash goes away.

fuzzybinary commented 5 months ago

@androidmitry Yeah if you can keep me posted I would appreciate it.

I'm seen issues in the past where the method channel can get disconnected from the plugin, but I've fixed those, and most threw errors in the native layer, not Dart.

nirmal0707 commented 4 months ago

This issue is also occurring in version 2.1.0, and we have encountered the MissingPluginException from the Android channels datadog_sdk_flutter.rum and datadog_sdk_flutter.logs in the production release. Due to consecutive RUM events, the error count is excessively high. Below are some error messages we've received:

  1. MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
  2. MissingPluginException(No implementation found for method createLogger on channel datadog_sdk_flutter.logs)
  3. MissingPluginException(No implementation found for method stopView on channel datadog_sdk_flutter.rum)
fuzzybinary commented 4 months ago

Hi @nirmal0707,

I'm actively investigating this, but I haven't had much reproducing. Do you happen to have any steps to reproduce, or anything you can tell me about your app before / after you started seeing the errors?

nirmal0707 commented 4 months ago

Hi @fuzzybinary ,

This issue was not reproducible but began occurring when we migrated our codebase to Flutter 3.16.4, three months ago. Previously, we were using version 1.5.1, and the Flutter upgrade required us to move the package version to 2.1.0, resulting in this issue arising for some users in production.

fuzzybinary commented 4 months ago

Alright, thanks @nirmal0707, That may help me track down the issue.

fuzzybinary commented 4 months ago

Hi folks -- a few questions for everyone to see if I can try to diagnose this:

Sorry this is taking so long but I am having a really hard time reproducing, even when forcing certain error states, and. comparing with Crashlytics, we perform the registration and de-registration of our method channels the same way they do, so I'm not sure how or why they'd catch the errors and we don't.

fuzzybinary commented 4 months ago

Another question as I continue to investigate -- Does anyone have any customizations of their FlutterActivity? Overriding onCreate, configureFlutterEngine, onDestroy or any other methods?

androidmitry commented 4 months ago

For us crash reports started coming when we upgraded flutter from 3.16.9 to 3.19.5

Is anyone using background tasks or foreground services

We have foreground service but we don't use flutter_background_service package. Also according to breadcrumb events attached to crash it usually happens in foreground.

Are you using push notifications or a push notification service like firebase_cloud_messaging ? Do these errors tend to spike immediately after a push notification is sent out ?

Yes. No.

Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?

No

Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?

Yes

Do the MissingPluginException errors correlate with any other errors around the same time?

Checked several users and no other issues were reported around same time

Does anyone have any customizations of their FlutterActivity?

We do, I will double check them.

feinstein commented 4 months ago

My error message is a bit different MissingPluginException(No implementation found for method reportLongTask on channel datadog_sdk_flutter.rum)

These are my Sentry logs:

image

Then a bunch of:

image

And then:

image

Maybe you are not handling the destroyed lifecycle correctly? Or another plugin is interfering?

fuzzybinary commented 4 months ago

Hi @feinstein, thanks for the additional information. All of the MissingPluginException issues are related, regardless of the method channel named and the method recorded, so any additional info is helpful.

The FlutterJNI error is interesting, that wouldn't be us so I'm very curious what might cause that, and curious if they're related.

We actually don't handle activity lifecycle at all, instead relying on Flutter's onAttachedToEngine and onDetachedFromEngine, which is what makes this error so frustrating, as those should be triggered properly when Flutter itself starts and stops.

Have you been able to reproduce locally at all?

feinstein commented 4 months ago

AFAIK Flutter JNI is the Java interop for connecting the C++ Flutter engine to the Android app.

Maybe Flutter is not triggering the engine's life cycle correctly to your lib.

androidmitry commented 4 months ago

We were not able to reproduce it locally. We made some tiny changes to our FlutterActivity, I will report if it helped.

btrautmann commented 4 months ago

@fuzzybinary just noting that we are still experiencing the issue mentioned in https://github.com/DataDog/dd-sdk-flutter/issues/552 (which I believe is the same issue being tracked here) despite removing the native cruft I referred to in my last comment on that issue. IIRC I am able to reproduce this in our application fairly consistently. If I have a sec today I'll play around and see if I can reproduce. According to another engineer on my team we're seeing ~249k instances of this issue per week. We've had to filter these issues out of our crash reporting to avoid going beyond our contracted threshold 🙃

fuzzybinary commented 4 months ago

STR would would be ridiculously helpful. If I can reproduce I can likely get it fixed and out with the next version ASAP.

maks-ucs commented 4 months ago

@fuzzybinary Just chimining in again on @nirmal0707 behalf, looking at our Sentry error logs, we also see a large number of lifecycle events being reported in quick succession in the error events for this:

image

And the above screenshot is only about a quarter of the pause/resume breadcrumb events in that particular Sentry error event.

Not sure if thats relevant, but perhaps this rapid set of lifecycle events causes some sort of race condition in the Datadog plugins setup code?

feinstein commented 4 months ago

This looks weird, so many transitions in under 1 second.

What makes me exclude a Flutter error is that only the DD plugin is raising this exception.... but on a second thought, few packages would trigger a method channel call when the app is being destroyed

fuzzybinary commented 4 months ago

Another question from research:

Is anyone suffering from this error still using runZonedGuarded over PlatformDispatcher.instance.onError? (If you are using Datadog.runApp we do not use runZonedGuarded)

I'm looking for commonalities here, since I cannot reproduce with any example I have, but all of my examples use PlatformDispatcher.

androidmitry commented 4 months ago

We use runZonedGuarded, is it deprecated ? We set PlatformDispatcher.instance.onError as well

fuzzybinary commented 4 months ago

PlatformDispatcher.instance.onError is preferred and the two do essentially the same thing.

I'm going to do more research but I'm curious if the new zone creation is occasionally bypassed by backgrounding / foregrounding.

fuzzybinary commented 4 months ago

Tests on my side related to runZonedGuarded don't duplicate the issue unfortunately.

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a potentially related Flutter issue if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

maks-ucs commented 4 months ago

Thanks for your continued efforts on this @fuzzybinary ! 👍

For our app we are not using multiple Flutter engines and we do use runZonedGuarded, though it seems thats likely not the source of the issue from your last comment.

feinstein commented 4 months ago

I am also using runZonedGuarded, I initialize Sentry, then DataDog. Here's how I initialize it:

Future<void> setupDatadog() async {
  final configuration = DatadogConfiguration(
    clientToken: 'mytoken1234',
    env: appFlavor ?? 'no-flavour',
    site: DatadogSite.us5,
    nativeCrashReportEnabled: true,
    loggingConfiguration: DatadogLoggingConfiguration(),
    rumConfiguration: DatadogRumConfiguration(
      applicationId: 'my-app-id-1234',
    ),
  );

  final originalOnError = FlutterError.onError;
  FlutterError.onError = (details) {
    DatadogSdk.instance.rum?.handleFlutterError(details);
    originalOnError?.call(details); // This allows me to not override other listeners, like Sentry.
  };
  final platformOriginalOnError = PlatformDispatcher.instance.onError;
  PlatformDispatcher.instance.onError = (e, st) {
    DatadogSdk.instance.rum?.addErrorInfo(
      e.toString(),
      RumErrorSource.source,
      stackTrace: st,
    );
    return platformOriginalOnError?.call(e, st) ?? false;
  };

  await DatadogSdk.instance.initialize(configuration, TrackingConsent.granted);
  DatadogSdk.instance.updateConfigurationInfo(LateConfigurationProperty.trackErrors, true);
}

That function is called inside a runZonedGuarded, after await SentryFlutter.init and WidgetsFlutterBinding.ensureInitialized();.

fuzzybinary commented 3 months ago

Hi folks - we still cannot reproduce this issue unfortunately. My guess is that this is some sort of race condition on the platform channel during backgrounding, where we are attempting to send view or log events while the app is backgrounding on Android.

However, I will say we do know that even though Sentry / Crashlytics report this as a “Fatal” error, it does not result in the application terminating, and is silent to the user. I verified this by essentially “force disconnecting” the method channel during testing and seeing what the response is from Flutter. This means that users are not seeing a degraded app experience because of this issue.

This doesn’t mean we don’t take the issue seriously, and if anyone can provide us with reproduction steps that would be incredibly helpful.

feinstein commented 3 months ago

Maybe contact the flutter team and ask them what might be causing this?

fuzzybinary commented 3 months ago

I've gone through some of the less formal channels (Discord, for example), but I may raise a github issue and see if it gets more attention.

androidmitry commented 3 months ago

All previous changes I made didn't help. We are planning a flutter sdk upgrade. I will post here if it helps.

btrautmann commented 3 months ago

@fuzzybinary I've been unable to give this attention due to some other pressing work, but I wanted to respond to:

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a https://github.com/flutter/flutter/issues/103483 if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

A coworker of mine was toying around with this and was able to confirm that there's a case where a user taps a deep link and in doing so a new Flutter engine gets created (I think because of some code we have on the native side, I doubt that this is default Flutter behavior). As a result, our main function is called again which calls the code that would initialize Datadog twice. My hunch (without really looking at the code on either side, I'm just leaving this comment between tasks) is that the move to a singleton on your end made this bug which was already occurring more obvious (because of all the errors we're seeing).

Obviously we have more triaging and likely some fixes to put in on our end, but I did want to (cautiously) confirm your hypothesis that the 2 engine thing may be one cause of the issue folks are seeing.

fuzzybinary commented 3 months ago

Thanks @btrautmann, that's really good information to have. I'm not sure if all of these issues are related to multiple Flutter engines, but I feel like its possible there are situations I don't know about that could legitimately create a second Flutter engine.

I'll have to think about how we can support that situation, but knowing that I can create a fake situation that artificially creates multiple engines and test that my solution works.

I'll try to get a solution for you in the next few weeks.

btrautmann commented 3 months ago

@fuzzybinary a couple interesting things to note here that I hope help:

Probable cause seems to be the singleton migration: In both https://github.com/DataDog/dd-sdk-flutter/issues/596#issuecomment-2103935904 and my issue https://github.com/DataDog/dd-sdk-flutter/issues/552 mention version 2.1.0 as the first version containing this issue. IIRC that was when the singleton was introduced.

but I feel like its possible there are situations I don't know about that could legitimately create a second Flutter engine.

The default launchMode for Android Flutter applications is singleTop. The docs here do mention several scenarios in which a new instance of the MainActivity would be created. Ours would, I think, fall under the scenario of a an Intent being created in a new task (that therefore does not contain an instance of our Activity), and I imagine there are a bunch of use cases where this could happen to other apps:

Screenshot 2024-06-14 at 10 21 30

IIRC, each Flutter Engine by default is scoped to the MainActivity so if you have 2 instances of that Activity you'd have 2 flutter engines. Sans a singleton, this in theory should work fine but a singleton within the same process (without a workaround to avoid double-initialization) would I think mean running into this issue.

As a disclaimer, this is all theoretical and it's been a while since I've really been in the weeds on any Android code so please take what I'm saying with a hefty grain of salt.

feinstein commented 3 months ago

Just to add a bit more information, my app has no custom native code from us, but we do have deeplinks, so you might be on something there...

fuzzybinary commented 3 months ago

I do agree the singleton migration was likely the cause, though it solved other issues related to engine initialization / shutdown.

I have a potential fix in mind that I'm going to look into implementing this week, but I'll likely release it as a preview release to get some feedback before mainlining the changes. I'll post to this thread when the preview version is available.

fuzzybinary commented 2 months ago

The newest version (2.6.0) no longer uses singletons to manage channel connections. Can some folks upgrade and see if this solves the issue?

btrautmann commented 1 month ago

Thanks @fuzzybinary. I put up a PR today to bump to 2.6.0 and will report back based on our Sentry numbers :) It should be fairly obvious whether this resolved the issue.

androidmitry commented 1 month ago

@fuzzybinary I no longer see the issue a week after releasing an app with the new sdk . Thank you!

fuzzybinary commented 1 month ago

Excellent! Thanks so much for letting me know. I'm going to close this but if anyone sees this issue again please reach out.

btrautmann commented 2 weeks ago

Confirming that we're seeing the same, a decrease in instances of this issue. Thanks for the work on this @fuzzybinary!

Screenshot 2024-09-09 at 09 59 36