DataDog / dd-sdk-ios

Datadog SDK for iOS - Swift and Objective-C.
Apache License 2.0
219 stars 127 forks source link

`application_start` event is not sent for Datadog sdk version >= 2.6.0 #2041

Closed saad352 closed 1 day ago

saad352 commented 2 months ago

Describe the bug

Datadog SDK sends action of type application_start. This is working fine until the version 2.5.1 but this event is not being sent from version 2.6.0 onwards.

More info I have scanned and tried to debug through the sdk code. The data for application_start seems to be gathered properly and written using FileWriter but I don't see data in the RUM Web dashboard. So my guess is that it is being collected but not sent. More over I see this code change to send N batches sequentially. Not sure but could be related to this. My reasoning for pointing to this PR is that we are only missing the first event which is application_start, the rest of the events I see are fine. So may be it is missing the first event and sending the later ones just fine? Or may be something else.

Reproduction steps

Use ddsdk version 2.5.1 and we can see the action application_start but when bumping to 2.6.0 the event does not seems to be sent.

SDK logs

N/A

Expected behavior

application_start event should be seen in the RUM dashboard.

Affected SDK versions

>= 2.6.0

Latest working SDK version

2.5.1

Did you confirm if the latest SDK version fixes the bug?

Yes, it does not fix the issue.

Integration Methods

Cocoapods

Xcode Version

15.4

Swift Version

swiftlang-5.10.0.13

MacOS Version

No response

Deployment Target

iOS 17

Device Information

All

Other relevant information

No response

mariedm commented 2 months ago

Hi @saad352, thank you for reporting this issue. Just to confirm, are you experiencing this on the latest version (2.16) as well?

saad352 commented 2 months ago

Hey @mariedm, Yes it is also occurring on latest version. 2.16.0. The last working version is 2.5.1

mariedm commented 2 months ago

Hi @saad352, thank you for your response. I tested it on my side and can see the application_start event in the RUM dashboard. Could you share a code sample so I can reproduce the issue? Also, are you making any other changes in your code when testing with versions 2.5.1 and 2.16.0?

saad352 commented 2 months ago

Hi @mariedm, I have tried to reproduce the issue in the sample code base but the issue is not happening in the sample code base with 2.16.0. I have copied pasted the same code in our application and it does not send the application_start event with version 2.16.0 but it does send the same event with version 2.5.1.

Below is the the code which I am using in both.

Datadog.initialize(
        with: Datadog.Configuration(
            clientToken: clientToken,
            env: environment,
            site: .us1
        ),
        trackingConsent: .granted
    )

  RUM.enable(
      with: RUM.Configuration(
          applicationID: appID,
          uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
          uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate()
      )
  )

Could there be something else which could be interfering with this in our codebase?

Also, are you making any other changes in your code when testing with versions 2.5.1 and 2.16.0? "

No, we are not making any changes. Just bumping the version.

saad352 commented 2 months ago

With the above new findings that issue is only happening in our codebase, I again checked the changes between 2.5.1(good) and 2.6.0(bad) and I see that there was swizzling introduced in that. I think some other framework in our code base is swizzling maybe same thing and that could cause the issue. I am trying to look more on what is that method which is being swizzled. If you know please share that.

mariedm commented 2 months ago

Hi @saad352,

Thank you for the additional information and for debugging on your side, this is very appreciated.

Could you enable debug logs to see if any error is reported by the SDK, please? Datadog.verbosityLevel = .debug RUMMonitor.shared().debug = true

Something else you could try is to set a breakpoint when the application_start event is being sent here, and verify that it's being triggered.

saad352 commented 2 months ago

Hi @mariedm, No, I don't see any error reported by SDK.

Something else you could try is to set a breakpoint when the application_start event is being sent here, and verify that it's being triggered.

Yes, this part is being triggered and I can see the values.

Do you think if swizzling of ViewController lifecycle events could be an issue here?

mariedm commented 2 months ago

I don't think swizzling has anything to do with it. The application_start event is triggered when enabling RUM, which is independant of any ViewController lifecycle event. Additionally, you've confirmed the event is collected and written.

Given that you are not facing this issue with a code sample, it seems to be specific to your app. Without your code, it's difficult for us to investigate further.

You might also try filtering actions by type using the search bar with action.type:application_start.

image

dcacenabes commented 1 month ago

Hi @mariedm , I'm pairing with @saad352 on this one. We have identified the commit that broke this action for us: d622d2182043fa74801e862eab25c3c6577d0780 After playing around with the changes introduced in this PR, we found that this is the code that makes the application_start action disappear from our RUM:

// Start "ApplicationLaunch" view immediatelly:
startApplicationLaunchView(on: command, context: context, writer: writer)

Commenting that one line out makes the action appear again in Datadog RUM.

maxep commented 1 month ago

Hi @dcacenabes , @saad352 👋

Thank you very much for investigating further. I would be super helpful if you had a sample project that reproduce the issue, I understand it's a big request but would be quicker for us to troubleshoot. If that's not possible, I would kindly ask if you could do further debugging.

Check the app start propagation; make sure the application launch view is properly created by setting breakpoints:

I'm interested to know where and why the propagation fails.

The fact that it works with a vanilla project suggests that the SDK init flow is different in your prod app. Where do you init the SDK? is it called from the app delegate/SwiftUI.App or later during the app process?

Thank you again for your help 🙏

dcacenabes commented 1 month ago

Hi @maxep , Thanks for chipping in!

The fact that it works with a vanilla project suggests that the SDK init flow is different in your prod app. Where do you init the SDK? is it called from the app delegate/SwiftUI.App or later during the app process?

We are initializing the SDK in a call from

  func application(
    _: UIApplication,
    didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?
  ) -> Bool

We're doing it like this:

Datadog.initialize(
      with: .init(
        clientToken: ...,
        env: ...,
        service: ...
      ),
      trackingConsent: ...
    )

    RUM.enable(with: ...)
    Trace.enable(with: ...)
    Logs.enable()

Check the app start propagation; make sure the application launch view is properly created by setting breakpoints:

On it! 🏃 🕵️ Will share my learnings later.

Thank you very much for investigating further. I would be super helpful if you had a sample project that reproduce the issue, I understand it's a big request but would be quicker for us to troubleshoot. If that's not possible, I would kindly ask if you could do further debugging.

We are already working on trimming our project to the bare minimum that reproduces the issue, but it is a complex project 🙏

dcacenabes commented 1 month ago

@maxep I have made some progress on debugging the issue. Although I have not yet figured out exactly where the error is, I have narrowed it down a lot, so I wanted to share my findings here already in case they can be of help. The issue seems to be related with trackingConsent. There seems to be a race condition somewhere in the DD SDK code.

  1. We initialize the Datadog SDK with trackingConsent: .pending
  2. Very shortly after initializing the SDK with tracking consent .granted
  3. Datadog SDK calls the migrateUnauthorizedData(toConsent:)
  4. the application_start action is then written into the unauthorized directory. <- Here's the problem.
  5. Since the migrateUnauthorizedData had already been called, the file that contains application_start action stays in the unauthorized directory and never gets migrated, and because of that it never gets sent.

When we comment out the code:

// Start "ApplicationLaunch" view immediatelly:
startApplicationLaunchView(on: command, context: context, writer: writer)

Then the action is written to the right directory.

dcacenabes commented 1 month ago

I can consistently reproduce it with a sample project.

You simply need to do this in your App Delegate:

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
    Datadog.initialize(
      with: Datadog.Configuration(
        clientToken: "token",
        env: "stage",
        service: "service-name"
      ),
      trackingConsent: .pending
    )
    RUM.enable(
      with: RUM.Configuration(
        applicationID: "app-id",
        uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
        uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate(),
        urlSessionTracking: RUM.Configuration.URLSessionTracking()
      )
    )
    RUMMonitor.shared().addAttribute(forKey: "build", value: "research-app-start")
    // This one works
//    DispatchQueue.main.async {
//      Datadog.set(trackingConsent: .granted)
//    }
    // This one does not work
     Datadog.set(trackingConsent: .granted)

    return true
  }
maxep commented 1 month ago

Hey @dcacenabes 👋

That's a very interesting finding, thank you very much 🙏

We have now steps to reproduce and we will investigate further on our side, it looks indeed like an issue of the SDK. As a workaround, could you set the trackingConsent decision as a SDK init parameter? or set it before enabling RUM?

I hope this helps. Thank you again for the detailed troubleshooting!

dcacenabes commented 1 month ago

As a workaround, could you set the trackingConsent decision as a SDK init parameter? or set it before enabling RUM?

We're using reactive programming and the tracking consent is being set from a stream, so we can't pass it from the beginning in an easy way. The workaround for us is changing:

.receive(on: ExecutionContext.immediateOnMain)

to

.receive(on: ExecutionContext.main)

This workaround is enough to get the application_start submitted to datadog, but this has now unveiled another issue. We are setting a few attributes to the RUM by using: RUMMonitor.shared().addAttribute(forKey: "key", value: "value") These attributes are correctly set for everything we send except for the application_start action. We can't call RUMMonitor.shared() before calling RUM.enable(with:), and RUM.enable is the one creating the application_start action. So the current state after the latest findings for us is:

  1. Introducing a delay allows us to successfully submit the application_start event
  2. the submitted application_start event is missing the custom attributes we add by using RUMMonitor.shared().addAttribute(forKey: "key", value: "value"), so we can't correctly filter the data in the Datadog RUM web.
mariedm commented 1 week ago

Hi @dcacenabes, thank you for the thorough investigation and detailed report; it's very helpful. We addressed the race condition in PR #2063, which will be included in our next release. Keep an eye out for the update!

ncreated commented 1 day ago

@dcacenabes we just released version 2.20.0 and it includes the fix. Thanks again for narrowing down the problem and finding steps to reproduce 🚀.

dcacenabes commented 1 day ago

Anytime :) Glad that we could be of help! We're just bumping our app to use the latest version and removing the workaround we introduced.