how do we integrate `telemetry` into atom core?

annthurium commented 6 years ago

do we want to send one daily metrics request for all of atom? Or does each package send its own metrics?

-if the former, we can:

make StatsStore a singleton and then you can construct a new one where you need it
Desktop current uses a setInterval loop to determine when to send stats. Is that the pattern we want to follow? If so, where do we put this loop? Do we put this in atom core somewhere on start up? If so, what kind of api would make sense to expose this through atom core for other packages?

annthurium commented 6 years ago

thought: we should move the logic where we are sending the metrics inside of the telemetry package because that simplifies some things.

annthurium commented 6 years ago

here are the things I talked about with @jasonrudolph today:

[x] Figure out if anything uses sendEvent's label and value params (since we only imagine telemetry taking an action name, not a label and value) (@jasonrudolph) -- https://github.com/atom/telemetry/issues/8#issuecomment-396641028
[x] Ask jnunemaker if Octolytics currently has (or aspires to have) the ability to record timing data (@jasonrudolph)
[x] Decaffeinate metrics package (@jasonrudolph)
[x] Make existing service that already is exposed through metrics package also send data through telemetry after telemetry is public
[ ] Consume metrics service in the github package to count commits
[x] Ask Thomas if:
- the business analytics tooling (i.e.,, octolytics, mode, presto) supports timing measurements
- data loss / jnunemaker's concern
[ ] telemetry should check for an internet connection. (navigator.onLine)
[ ] Send GitHub package token with central. We don't want to block on having a unified auth flow for Atom, and when we build a unified auth token for Atom we can revisit this.
[ ] Implement timings in telemetry.
[ ] Investigate how to keep lokijs (file backed storage for measures) from balooning in size if we are storing too much data when we do timing. We don't anticipate a problem with counters here. Also this isn't a problem if we decide to send events as they happen instead of aggregating.
[x] Central changes to add Atom to the list of whitelisted apps (@smashwilson)
[ ] Central changes to make the /usage endpoint authenticated (@shana)

bitmoji

jasonrudolph commented 6 years ago

[ ] Decaffeinate metrics package (@jasonrudolph)

I opened https://github.com/atom/metrics/pull/88 to tackle this work. It's code complete and all the tests are passing.

Next up, I need to figure out how to do a smoke test to make sure that it still works end-to-end when running inside Atom (i.e., that it correctly captures metrics and sends them to Google Analytics).

jasonrudolph commented 6 years ago

Ask jnunemaker if Octolytics currently has (or aspires to have) the ability to record timing data (@jasonrudolph)

I chatted with @jnunemaker about this today. (Thanks for the chat, @jnunemaker!)

The short answer is that Octolytics already supports timings. It does so using measures. For example, if we wanted to capture the time it takes to load the window, and we observed that it took 42 milliseconds to load the window, we could send an event to Octolytics with a measure of window_load_time: 42.

@jnunemaker shared some additional context that might be useful to us as well. I'm hoping to summarize that and share it soon.

jasonrudolph commented 6 years ago

The short answer is that Octolytics already supports timings. It does so using measures. ... @jnunemaker shared some additional context that might be useful to us as well. I'm hoping to summarize that and share it soon.

Circling back around to this. Here's the additional context I mentioned:

A timing example

John pointed me toward an example usage where github.com records window performance timings: https://github.com/github/github/blob/bc643bff73/app/assets/modules/github/collector-api.js#L266-L291

Aggregate vs. raw data

tl;dr "You can never de-aggregate data, so the more raw the better." -- @jnunemaker

Today, Atom reports metrics as they happen. For example, when you open a file, Atom reports a file open event immediately. It also includes the file's grammar. And since Atom is sending the event immediately, the metrics include the timestamp of the event.

In its current state, atom/telemetry currently aggregates data locally and sends the aggregated data once per day. (As I understand it, atom/telemetry uses this approach in order to be consistent with the approach used by GitHub Desktop.) If we tracked file open events in this way, we'd likely use a file_open_count measure. We'd lose the timestamp data for each file open event (which might be fine), but it's unclear to me how we'd capture the grammars associated with each file open event. 🤔

With this in mind, for richer data, John suggests that we consider using events instead of counts when we want to track how often an action is occurring:

For example, instead of using a commit_count measure, we could send a commit event each time the user performs a commit. Then, to get the count, we'd count the number of commit events.
This richer data can include things like the event timestamp (which we can't get if we aggregate the data and send a single commit count) and would give us the flexibility to include related measures like lines added, lines deleted, lines modified. For example, see the data sent to Octolytics by the github.com post receive job:
- https://github.com/github/github/blob/bc643bff73/jobs/post_receive.rb#L446-L449
- https://github.com/github/github/blob/bc643bff73/test/jobs/post_receive_test.rb#L649-L652

@annthurium: I hope this perspective is helpful. I'd love to sync up with you soon to talk through our options.

jasonrudolph commented 6 years ago

[ ] Figure out if anything uses sendEvent's label and value params (since we only imagine telemetry taking an action name, not a label and value) (@jasonrudolph)

I see 4 callsites that use either the label param or the value param. I see zero callsites that use both params. In other words, calls to sendEvent include a category and an action, and sometimes also one additional piece of data (either a label or a value). I've described the callsites and a proposed approach for each one below.

@annthurium: What do you think of the approach proposed below?

Callsites

https://github.com/atom/metrics/blob/a6c70ae/lib/metrics.js#L43

Parameters:
- category: 'window'
- action: 'ended'
- label: null
- value: a number indicating length of session
Proposal: Consider recording this as a timing instead of an event.
https://github.com/atom/metrics/blob/a6c70ae/lib/metrics.js#L57

Parameters:
- category: 'file'
- action: 'open'
- label: string representing grammar name
- value: undefined
Proposal: Teach telemetry to support events with metadata attributes that can be defined at runtime. For example, here, we could record an event with the following attributes:
```
{
  "category": "file",
  "action": "open",
  "grammar": "text.html.php"
}
```
https://github.com/atom/metrics/blob/a6c70ae/lib/metrics.js#L63

Parameters:
- category: 'setting'
- action: 'core.telemetryConsent'
- label: 'no' or 'limited'
- value: undefined
Proposal: Same as previous callsite. For example, here, we could record an event with the following attributes:
```
{
  "category": "setting",
  "name": "core.telemetryConsent",
  "value": "limited"
}
```
https://github.com/atom/metrics/blob/a6c70ae/lib/metrics.js#L181

Parameters:
- category: 'deprecation-v3'
- action: string indicating package name and version (e.g., 'markdown-scroll-sync@2.1.2')
- label: string containing the full deprecation message [screenshot]
- value: undefined
Proposal: Same as previous callsite. For example, here, we could record an event with the following attributes:
```
{
  "category": "deprecation-v3",
  "package": "markdown-scroll-sync@2.1.2",
  "message": "The contents of `atom-text-editor` elements are no longer encapsulated within a shadow DOM boundary. Please, stop using `shadowRoot` and access the editor contents directly instead."
}
```

annthurium commented 6 years ago

@jasonrudolph thanks for taking the time to investigate this!

For 3., telemetry already does have an event for consent opt ins / opt outs, so we could just use that instead of defining an event with custom metadata. The rest of the proposal sounds great!

jasonrudolph commented 6 years ago

Over in https://github.com/atom/atom/issues/17501, we have a task to:

[ ] Decide whether we'll send metrics to telemetry in real-time or in batches

After having chatted with @annthurium earlier this week, I think we have a plan in place:

Telemetry will accept metrics in two forms:
- simple usage counts that get aggregated on a daily basis (e.g., the number of times the app was opened on a particular day)
- raw (non-aggregated) events, where each event includes the timestamp of the event and arbitrary metadata (e.g., a file-open event at time t with a 'grammar' attribute whose value is 'source.js')
Telemetry will report these metrics to the GitHub metrics back-end in batches, not in real-time. This reporting will likely occur once per day.

@annthurium: Have I captured that accurately?

annthurium commented 6 years ago

@jasonrudolph: yes, that looks accurate, thank you.

atom / telemetry