element-hq / element-meta

Shared/meta documentation and project artefacts for Element clients
75 stars 12 forks source link

Posthog: Add more properties to unable to decrypt errors to allow more filtering options #2332

Closed BillCarsonFr closed 5 months ago

BillCarsonFr commented 8 months ago

Currently the captured Error events for unable to decrypt errors do not contain much information. This lack of information is preventing us from getting interesting insights from the raw data.

Goal: We want to add some new properties that will allow us to better filter out Posthog insights.

List of needed properties:

### Tasks
- [ ] https://github.com/matrix-org/matrix-analytics-events/pull/92
- [ ] https://github.com/matrix-org/matrix-analytics-events/pull/93
- [ ] https://github.com/matrix-org/matrix-analytics-events/pull/97
- [ ] https://github.com/element-hq/element-web/issues/27168
- [ ] https://github.com/element-hq/element-android/issues/8779
- [ ] https://github.com/element-hq/element-ios/issues/7767
BillCarsonFr commented 8 months ago

These will also be improvements/addition to https://github.com/element-hq/element-meta/issues/2300

richvdh commented 8 months ago

eventLocalAgeAtDecryptionFailure

Is it better to log the age relative to the session, or to log the session-creation-time and the timestamp on the event as separate fields (and then do the calculation on the posthog side)?

BillCarsonFr commented 8 months ago

eventLocalAgeAtDecryptionFailure

Is it better to log the age relative to the session, or to log the session-creation-time and the timestamp on the event as separate fields (and then do the calculation on the posthog side)?

AFAIK There is no easy way to do that on posthog side (not in the query to display graphs), maybe a posthog plugin where you can pre-process and decorate events.

richvdh commented 8 months ago

isFedarated

isFederated

richvdh commented 8 months ago
  • eventLocalAgeAtDecryptionFailure (numeric, time in seconds)
  • timeToDecrypt (numeric, time in seconds)

It looks like other "time" fields in the analytics events measure time in milliseconds. Please let's try to be consistent.

richvdh commented 8 months ago
  • timeToDecrypt (numeric, time in seconds) : Currently there is a grace period of 4s, after that UTDs are reported. We would like to improve the existing mecanism by adding a timeToDecrypt properties when an event is decrypted late. We should track decryption errors for 1mn (TBD), if events are decrypted during this period we still report an UTD with the time it took to decrypt, after 1mn we report anyhow. Would allow to filter on posthog temporary vs definitive UTD.

Trying to understand this a bit better. Could you explain what will happen under this plan for each of:

BillCarsonFr commented 8 months ago
  • timeToDecrypt (numeric, time in seconds) : Currently there is a grace period of 4s, after that UTDs are reported. We would like to improve the existing mecanism by adding a timeToDecrypt properties when an event is decrypted late. We should track decryption errors for 1mn (TBD), if events are decrypted during this period we still report an UTD with the time it took to decrypt, after 1mn we report anyhow. Would allow to filter on posthog temporary vs definitive UTD.

Trying to understand this a bit better. Could you explain what will happen under this plan for each of:

* An event decrypted in more than 0s but less than 4s (it is not reported to posthog at all?)

* An event decrypted in more than 4s but less than 60s (it is reported to posthog, but with an appropriate `timeToDecrypt` ? How does this relate to [Posthog | Increase decryption failure grace period #2303](https://github.com/element-hq/element-meta/issues/2303)?)

* An event which is still not decrypted after 60s (we give up waiting, and report it to posthog with `timeToDecrypt` set to ... `+inf` ?)

Yes so far I think:

The max wait period has to be defined, the main risk is that this wait time is too long and the client is closed meanwhile and the error not reported at all.

So maybe it should be 30s and not 60s

BillCarsonFr commented 8 months ago
  • eventLocalAgeAtDecryptionFailure (numeric, time in seconds)
  • timeToDecrypt (numeric, time in seconds)

It looks like other "time" fields in the analytics events measure time in milliseconds. Please let's try to be consistent.

Yes, updated

richvdh commented 8 months ago

Yes so far I think:

  • Decrypted under 4s not reported (mostly to be a bit backward compatible)
  • Decrypted before Max Wait time (60s) Report with timeToDecrypt it took
  • If after Max wait time it's still not decrypted we report with a timeTodecrypt > MAX (we could put some +inf value also)

The max wait period has to be defined, the main risk is that this wait time is too long and the client is closed meanwhile and the error not reported at all.

So maybe it should be 30s and not 60s

So do we do this instead of https://github.com/element-hq/element-meta/issues/2303?

BillCarsonFr commented 8 months ago

So do we do this instead of #2303?

I think so, let's confirm at the daily

BillCarsonFr commented 8 months ago

Closed https://github.com/element-hq/element-meta/issues/2303 and do this task instead

BillCarsonFr commented 8 months ago

For UTD that are not resolved after the max_time, we might want to report timeToDecypt=-1 That would allow to filter more easilly

andybalaam commented 7 months ago

@BillCarsonFr will break into separate tasks.

uhoreg commented 5 months ago

@BillCarsonFr All the tasks in this issue are marked as done. Can this issue be closed, or is there more work to do here?

BillCarsonFr commented 5 months ago

yes let's close there is a roundoff for EX