covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
364 stars 179 forks source link

Active cases data #389

Closed zbraniecki closed 4 years ago

zbraniecki commented 4 years ago

It seems that with JHU change wrt. recovered, we struggle to accumulate active cases data.

Is that going to be impossible going forward or can we find sources for that data for major regions?

active is slightly related to recovered (see #388), but not necessarily possible to calculate out of. The active = cases - deaths - recovered is flawed. There are some proposals for better calculations like: https://github.com/CSSEGISandData/COVID-19/issues/1250#issuecomment-604475689 and the following comments.

lazd commented 4 years ago

Thanks for all the well thought-out issues, @zbraniecki! Keep them coming.

We are not going to forecast active cases based on a time period, we just aggregate data.

active and recovered are super important, but in the absence of that data coming directly from the sources we scrape, it will not be included in our dataset.

Of course, layering this on top of our dataset for your own visualizations and predictive models is a fine thing to do, but we won't be doing it within the dataset.

zbraniecki commented 4 years ago

We are not going to forecast active cases based on a time period, we just aggregate data.

That's not true. As you mentioned in #388 you calculate active as cases - deaths - recovered, no?

I'd argue that there are better algorithms for that, and we shouldn't calculate that and only include active if the source provides it?

zbraniecki commented 4 years ago

In which case, I guess, I'm arguing for the same thing as in #388 - turn this issue into a place to coordinate accumulating sources for active cases since JHU stopped providing them.

zbraniecki commented 4 years ago

This source seems to still show active - https://www.worldometers.info/coronavirus/