davidbau / covid-19-chart

Chart of current COVID-19 time series data. Enables a variety of county- state- and nation-level comparisons and data exploration.
https://covid19chart.org/
18 stars 4 forks source link

Use a sliding window for deltas #42

Closed davidebbo closed 4 years ago

davidebbo commented 4 years ago

This is not quite ready, but is for initial discussion about #40.

The reasoning I made is that too much choice is bad, and that maybe we should only offer sliding deltas since plain deltas are so noisy. We could rename it to sliding deltas.

I made the number of days we look back configurable (in code only), as deltaInterval. So please experiment by changing this hard coded value. Setting it to 1 is the old behavior. I set it to 4 for now, which significantly smoothens the delta graphs (while keeping their overall shape).

But I'd like to discuss a small problem: because I take the average delta between the first and last day (of the interval), it would be a more proper estimate for the middle day. But right now, I use it as the value for the last day. So if the deltas are increasing a lot, we end up with a number that's a little low for the date it's assigned to (and vice versa if going down). Potential solution is to make the sliding window be centered around the current date, and I could try that (not a hard change). That still leaves some pain point at the end of the graph, since there is no 'future' data. And obviously, this problem gets worse with larger sliding windows. There are probably some crazy extrapolation algorithms to deal with that if we care.

davidbau commented 4 years ago

On whether to label the sliding window with the endpoint or the center, and whether to use a fancy weighting kernel for the window.

I like your code with i-deltaInterval as-is. Simpler-to-explain is better. Backward-looking stats are easy to explain.

For this chart, I'd like to avoid getting into the future-looking business - that's the domain of a predictive model, and something I think we don't want to touch. I think it's simpler to say "avg change over previous 5 days" rather than "avg change over 5 day window centered at the day". (which includes a couple future days and starts inviting the question about what exactly we mean by putting a number on a day where the future was unknown at that time).

davidbau commented 4 years ago

too much choice is bad

Also - agreed on this for the user interface. In the code we can maintain flexibility, but then just have one particular choice presented to the user.

davidebbo commented 4 years ago

Thanks for completing it. Played with it and behavior is great!