atom / telemetry

sends usage metrics to GitHub's internal analytics pipeline
MIT License
11 stars 12 forks source link

Fix race condition when sending daily stats #25

Closed annthurium closed 5 years ago

annthurium commented 5 years ago

Problem

@rafeca (and @shana) pointed out that there is a race condition with reporting stats. If we have multiple client windows open, we could end up sending the same stats twice. The db is shared across all client instances, and the database is cleared asynchronously after waiting for the request to finish.

Solution

The fix is to add an isReporting class attribute, which we set to true at the beginning of the reportStats method and re-set to false at the end. Since the StatsStore class is a singleton, and nodejs event loop is single threaded*, hopefully this will prevent multiple running windows from trying to send data at the same time.

Alternate approaches

I also considered making each window have its own db instance. Shana also pointed out that there might be race conditions on writes in https://github.com/atom/telemetry/issues/21. However, I did a little research and it looks like lokijs throttles automatic saves when these would cause overlaps with a save currently in progress. This prevents data loss at the cost of less frequent saves. So I think we're ok here, but I'm open to changing this if you disagree.

rafeca commented 5 years ago

Wow this was fast! Thanks for working on this!! πŸŽ‰

The fix is to add an isReporting class attribute, which we set to true at the beginning of the reportStats method and re-set to false at the end. Since the StatsStore class is a singleton, and nodejs event loop is single threaded*, hopefully this will prevent multiple running windows from trying to send data at the same time.

I'm pretty sure that each electron window runs in a separate renderer process, in a similar way that chrome does for each tab, so I don't think that this solution would be effective.

Instead, we can use the localStorage to create a simple mutex, since it is shared between the different renderer processes. That solution would be very similar than this PR, but instead of keeping the isReporting value in memory it should be read/written from localStorage (note that localStorage is not guaranteed to be thread-safe, but in this case this should not be an issue).

rafeca commented 5 years ago

After looking at this I just got curious about lokijs and how it implemented the storage of events and from what I saw by default lokijs is an in-memory database and does not persist the data by default (so each window currently has its own list of events).

This means that race conditions don't really exist at the moment: each window currently keeps its list of events but they "conflict" when checking the last-daily-stats-report data from localStorage.

But if I'm not mistaken, this also means that we're currently losing all the events that haven't been sent once the Atom window gets closed, and since we're only sending events once Atom has been opened for 4h, we're losing any event from sessions shorter than 4h.

Using a persistance adapter on LokiJS (as mentioned in the docs) would fix this specific problem, but then we would introduce the concurrency issues when having multiple windows open, since lokijs only syncs the data from the persistance adapter on load:

An important distinction between an in-memory database like lokijs and traditional database systems is that all documents/records are kept in memory and are not loaded as needed. Persistence is therefore only for saving and restoring the state of this in-memory database.

So I see two potential paths forward (there may be more options):

annthurium commented 5 years ago

lokijs is an in-memory database and does not persist the data by default (so each window currently has its own list of events).

πŸ€¦β€β™‚οΈI did not even realize that lokijs is in memory and doesn't persist. Thanks for bringing that up! A lesson to me to investigate tools more deeply.

Use a global db instance for all the windows but move away from lokijs and use a system that ensures that multiple concurrent clients are handled correctly (even by accessing IndexedDB directly). This could cause some performance regressions if we have many events, since for every read/write we would need to access the storage system (and serialize/deserialize the data).

This seems simpler to implement, since I can imagine more concurrency bugs that might shake out of the other approach, if we have multiple windows that are writing to disk when they close, and also trying to send stats.

Since I don't love the plain indexdb api, I evaluated two other indexdb wrappers.

pouchDB

pros

cons

dexie

https://dexie.org/docs/API-Reference#quick-reference

pros

cons

Dexie is the clear winner here, so I'll start implementing a move from lokijs to dexie.

@rafeca please let me know if I'm missing anything, or if you have any concerns about this approach. Thanks for your help!

rafeca commented 5 years ago

Hey @annthurium, sorry for the delayed response, I was on PTO until today πŸ–

Thanks for investigating potential solutions and writing down the differences! I agree with your outcome and dexie looks like a good solution (alternatively we could use directly the IndexedDB API to save the 55KB since we don't need complex queries logic, but that API is a nightmare to work with πŸ˜…, and as you mention 55KB is not a big deal for Atom).

rafeca commented 5 years ago

Closing this PR in favor of https://github.com/atom/telemetry/pull/29