Open findleyr opened 3 weeks ago
We're not getting user counts from this data, so I'm not sure why we care about 28-day-users. We'd only see something that old if someone used the go command three weeks ago and then not again until today. We do want to cap how long we need to wait for a given week's official record to stabilize, and 28 days seems very long.
It seems like we should flip in the other direction and reject uploads > 8 days old, at least if "old" is defined as today minus end-of-week. (For today minus start-of-week, 8 days old would really only be a 1-day-old cutoff.)
Some responses inline below, but rejecting uploads > 8 days SGTM.
We're not getting user counts from this data, so I'm not sure why we care about 28-day-users.
We estimate the number of 28 day users by other means, and I thought it would be useful to be able to measure the experience of this population. Of course, there are various other biases that prevent us from accurately measuring the 28 day user population (for example, infrequent users are probably less likely to opt in to telemetry).
We do want to cap how long we need to wait for a given week's official record to stabilize
I don't actually have strong opinions about where we put the cutoff. 8 days sounds reasonable. Eyeballing recent data, it looks like we get >90% of reports within a week, so let's go with this. CC @fflewddur for awareness.
I did more investigation, and discovered that there are more reasons why older upload dates have relatively more "late data":
These three factors contribute to having much more "late" data in Dec 2023 / Jan 2024. Looking at recent data, this is significantly less of a problem now.
Sounds good, thanks.
Right now, we have a nightly job that produces merged reports (those available at telemetry.go.dev/data) for the last 8 days. Yet the client may upload data for the last 21 days.
This has a few problems: