CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

Spreadsheet for generating relevant plots and predictive Gaussian models #2123

Open CherylJosie opened 4 years ago

CherylJosie commented 4 years ago

Announcement: I wrote a spreadsheet for visualizing the global time series. I overlaid a Gaussian model to predict the peak deaths/day/100K and estimate total deaths across the pandemic.

Covid-19

The wiki has many sample charts and screenshots, and explains which processed metrics I found useful in estimating/predicting the height, width, and timing of the Gaussian model.

Covid-19/wiki

Plots are generated for the world and also filtered by nation. I included the capability to plot two nations side-by-side for tracking the spread from nation to nation, with variable time shifts to separate traces for clarity if needed.

I calculated new cases /day/100K or active cases/km, and used these as indicators for estimating the timing and peak of the Gaussian on the deaths/day. I used my existing models of nations that already peaked (China) to estimate the width (time span) of nations yet to peak. This method is not statistically rigorous like a least squares/extrapolation, but it seems to sort of work.

The metrics are displayed as raw numbers and also per capita and per km so that they reflect infection density. I found this to be a more useful way to visualize the spread of the pandemic and evaluate the effectiveness of mitigation efforts than exclusively relying on the raw time series of absolute counts.

I calculated two metrics of deaths percentage, based on confirmed cases or on resolved cases, with the latter being suppressable in case the resolved data is not trusted.

The README has additional details, and there is an included instruction sheet in the spreadsheet that explains how to use it.

You shouldn't have too much trouble modifying the spreadsheet for the US specific data. I might do that some time but if you really need it I suggest you go for it and not wait on me.

I documented my progress on my Facebook. Here are a couple of sample posts:

Covid-19 Series Part 23: The Peak

Covid-19 Series Part 15: How reliable is the data?

I know you guys have been struggling to build this functionality in relational databases. I took a shortcut and hacked it into a spreadsheet with data import and model fitting done manually, and then I took longer than I hoped posting it online because Git is new to me. But... it's here now.

Thanks so much for publishing the time series, Ryan. Your efforts inspired me.

Cheryl

ivanMSC commented 4 years ago

How is this an issue related to this repo? This is not your personal blog.

CherylJosie commented 4 years ago

How is this an issue related to this repo? This is not your personal blog.

Thanks for the warm welcome to GitHub.