element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
Apache License 2.0
10.69k stars 1.88k forks source link

Posthog: Stop double reporting UTD events when app is relaunched. #27421

Open uhoreg opened 2 weeks ago

uhoreg commented 2 weeks ago

This is the issue for the Element Web portion of https://github.com/element-hq/element-meta/issues/2333

Summary: Element Web does not persist what events have been reported as UTDs, so when it restarts, UTDs get re-reported if it tries to decrypt them again, leading to over-reporting of UTDs.

uhoreg commented 2 weeks ago

Looking at the existing (commented out) code to do this, it tries to store all the tracked event IDs as an array in localStorage. I tested Firefox and Chrome for the storage limit of a single localStorage entry, and they both can store up to about 5200000 characters. Event IDs in v3+ rooms are 44 characters, plus two quotation marks and a comma if we store it as a JSON array, so 47 characters per event ID. That means that we would be able to store 110.6k event IDs. Which is probably enough.

But we should probably have some sort of handling for when we do run out of space. The most reasonable thing to do would be to evict the oldest events (whether by origin_server_ts, or by when we first tried to decrypt the event), with the rationale that if an event is really old, it's less likely that we'd be trying (and failing) to decrypt the event, since the user probably won't be scrolling back that far. But if we're storing a Set as a JSON array, as the current code does, we don't have a timestamp. Storing the events as a map from event ID to timestamp would add 14 characters if we use millisecond-precision, or 11 characters if we use second-precision (including the separator between the event ID and the timestamp). Second-precision is probably good enough, so that would give us 58 characters per event, and we'd be able to store 89.6k events. Which is still probably enough.

The alternatives would be to:

t3chguy commented 2 weeks ago

@uhoreg keep in mind the space is likely not per-key but for the whole Storage object, and there's no way to detect when you're approaching the limit, only when you cross it. The quota estimation API is often wildly wrong.

uhoreg commented 2 weeks ago

Yeah, one thing we could do is to impose our own limit on how many keys are stored. It would be bad if we used up all of localStorage for this, and then something more important tried to use localStorage and failed.

Another alternative that was suggested was to use a probabilistic data structure such as Bloom filters.