atom / telemetry

sends usage metrics to GitHub's internal analytics pipeline
MIT License
11 stars 12 forks source link

Initial postmortem on telemetry package for editor tools typescript/javascript clients #21

Open shana opened 6 years ago

shana commented 6 years ago

Postmortem on telemetry package for editor tools typescript/javascript clients

When adapting the atom-telemetry package to be used by the vscode extension, here's the issues I encountered and the solutions I came up with.

When referring to the atom-telemetry package, I'll just refer to it as "the package".

VSCode doesn't expose local storage

The package relies on local storage to store information such as the last time stats were submitted. It also uses lokijs as a database backend for the telemetry data, and I'm pretty sure that also uses local storage as the default backend.

VSCode doesn't expose localStorage; instead it provides state objects that can be used to store arbitrary information, either globally for all workspaces, or on a per-workspace basis.

Solution

Make the database backend implement an interface, and allow clients to bypass the default implementation by providing their own implementation of the database interface in the constructor. The default implementation is still using lokijs and local storage, and tests run with that.

Other settings that use local storage were also moved into a separate interface. The default implementation can be overridden in the same way as above.

Multiple instances of the same editor recording telemetry can trample each other's data

It is possible for multiple telemetry clients to overwrite or trample each other's data because the data is meant to be stored in a way that's shared across all clients. When reading or writing measures, for instance, the database backend stores the whole collection without any specific identifiers that connect that collection to a particular client. If two instances of an editor want to record telemetry at the same time, there is a potential for a race on who gets to read/write the data first.

I doubt lokijs is doing global locking on accessing data, so individual data trampling is a potential issue.

Double submission of metrics is also an issue, if two instances are running at more or less the same time.

Solution

Each client records its own metrics with a unique identifier valid for the session (i.e., not stored anywhere but in memory). The data is stored in one object instead of four collections, to make lookups easier - this is optional, but it makes the stored data match what gets submitted, so implementations that want to save the data to disk will save the closest format to what central takes without any special handling. A telemetry client reads and updates one IMetrics object containing all the different metrics, keyed by a guid ID unique to that session.

On submission, all the data from previous days is sent as an array. This means there will be multiple entries for the same day (one for each editor session that happened on that day). Whoever starts the submission process first takes ownership of all the data (i.e. sets the lastSubmissionDate flag ahead of time) to make sure no one else attempts to submit the data.

Missing typescript bindings

The package is a typescript app but wasn't generating typescript bindings, making it very hard to use in other typescript apps

Solution

I initially had typescript generate the bindings automatically, but that generated a lot of garbage, so I moved to a manual bindings file, and deduped any types that could be shared between the package implementation and the clients.

rafeca commented 5 years ago

It is possible for multiple telemetry clients to overwrite or trample each other's data because the data is meant to be stored in a way that's shared across all clients. When reading or writing measures, for instance, the database backend stores the whole collection without any specific identifiers that connect that collection to a particular client. If two instances of an editor want to record telemetry at the same time, there is a potential for a race on who gets to read/write the data first.

I'm seeing some of this issues happening at this moment when running multiple instances of Atom, did we get to resolve them? /cc @annthurium