getsentry / team-mobile

Meta issues for the Mobile team
4 stars 1 forks source link

[EPIC] Carrier transactions #157

Closed philipphofmann closed 1 year ago

philipphofmann commented 1 year ago

As of Nov 14th, 2023, the auto instrumentation of spans requires an active transaction bound to the scope to have something to attach the spans to. The SDK loses spans when no active transaction is tied to the scope. Instead, we want to send as many spans as possible to Sentry so users get more value from auto performance monitoring.

We agreed on using transactions as carriers for spans in the RFC mobile transactions and spans. The basic concept is to create a transaction whenever an auto-instrumented span is required.

All SDKs must hide this feature behind a feature flag, as it will create plenty of transactions, significantly impacting running out of quota. When users enable carrier transactions (maybe we can come up with a better name), the SDK turns off the UI event transactions because both interfere with each other. The long-term goal is that carrier transactions will replace UI event transactions. To not lose the UI events, the SDK must treat UI events like any other auto-generated span, such as file IO, HTTP requests, etc, meaning the SDK adds spans to the carrier or screen load transactions for UI events.

Similar to UI event transactions, carrier transactions combine two concepts: idle transactions and wait-for-children. If you are unfamiliar with these concepts, read more about them here. The basic idea is that whenever the SDK creates an auto-instrumented span, and there is no transaction available to attach it to, the SDK creates a transaction. Then, the SDK waits until all children of the transaction finish, or a timer of 30 seconds times out. The goal is to send as few transactions as required because each of them requires an HTTP request to Sentry, but also to send transactions quickly because the app could crash and the SDK could lose some data. We no longer need to worry about the timeout when switching to span ingestion in the future. A 30-second timeout is an acceptable value to start with.

The SDK sends carrier transactions as regular transactions to Sentry. They don't have a new envelope item type or a new event type.

Carrier transactions need to work with profiling. The current proposal should be compatible as the profiler starts when the SDK starts a transaction.

### Tasks
- [ ] https://github.com/getsentry/sentry-cocoa/issues/3407
- [ ] https://github.com/getsentry/sentry-java/issues/3056
- [ ] Flutter
- [ ] React-Native
- [ ] Add the specification to develop docs

Specification

Carrier transactions all have the name CarrierTransaction, the transaction_info.source is component, the operation is carrier, and the trace.origin is auto.carrier:

{
    "transaction": "CarrierTransaction",
        "transaction_info": {
        "source": "component"
    },
    "contexts": {
        "trace": {
            "op": "carrier",
            "origin": "auto.carrier"
        }
    },
    "spans": []
}

General behaviour

Scenario: No transaction bound to scope
  Given the SDK has no transaction bound to the scope
  When the SDK starts a new auto-generated span
  Then the SDK starts a new carrier transaction
  And binds it to the scope

Scenario: Timer times out
  Given an active carrier transaction bound to the scope
  And all its spans are finished
  When the transaction times out
  Then the SDK finishes the transaction
  And removes the carrier transaction from the scope

Scenario: Timer times out, but unfinished spans
  Given an active carrier transaction bound to the scope
  And one or more of its spans are not finished
  When the transaction times out
  Then the SDK waits for the spans to finish
  And removes the carrier transaction from the scope

Scenario: Waiting for unfinished spans that never finish
  Given a carrier transaction not bound to the scope
  And it is waiting for spans to finish
  When the spans aren't complete within the deadline timeout
  Then the SDK finishes all spans with the status deadline exceeded

Scenario: All spans finish when waiting for unfinished spans
  Given a carrier transaction not bound to the scope
  And it is waiting for spans to finish
  When all spans finish
  Then the SDK finishes the transaction
  And sends it to Sentry

Scenario: New carrier transaction with unbound carrier transactions waiting for spans
  Given a carrier transaction waiting for its spans
  And the carrier transaction is not bound to the scope
  When the SDK starts a new auto-generated span
  Then the SDK starts a new carrier transaction
  And binds it to the scope

Scenario: New screen load transaction
  Given the scope has a carrier transaction bound
  When the SDK starts a new screen load transaction
  Then the SDK removes the carrier transaction from the scope
  And cancels the idle timeout of the carrier transaction
  And waits for all the spans of the carrier transaction to finish

Scenario: Manually created transaction bound to the scope
  Given an ongoing manually created transaction by the user bound to the
      scope
  When the SDK starts a new auto-generated span
  Then the SDK adds the span to the manually created transaction
  And the SDK doesn't wait for spans to finish
  And the SDK doesn't use an idle-timeout

UI Events (clicks, scrolls, swipes, navigation events) spans

Scenario: UI Event span no transaction bound to scope
  Given the SDK has no transaction bound to the scope
  When the SDK starts a new UI event span
  Then the SDK starts a new carrier transaction

Scenario: UI Event span transaction bound to scope
  Given the SDK has a transaction bound to the scope
  When the SDK starts a new UI event span
  Then the SDK adds the span to the transaction bound to the scope
philipphofmann commented 1 year ago

As discussed with @kahest, @markushi, @shruthilayaj, @narsaynorath, and @phacops, we will close this issue and use span ingestion as proposed in this PR instead https://github.com/getsentry/relay/pull/2620.