CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.16k stars 18.47k forks source link

Upcoming changes in time series tables #1250

Open CSSEGISandData opened 4 years ago

CSSEGISandData commented 4 years ago

We will update the time series tables in the following days, aiming to provide a cleaner and more organized dataset consistent with our new/current naming convention. We will also be reporting a new variable (i.e, testing), as well as data at the county level for the US. All files will continue to be updated daily around 11:59PM UTC.

The followiing specific changes will be made:

Thanks!

Update: time_series_covid19_recovered_global.csv is added.

DataChant commented 4 years ago

Will recovered cases still be reported on the daily CSV files? Will they reflect the daily recovered or aggregated?

CSSEGISandData commented 4 years ago

@DataChant No recovered cases will be reported in the daily reports and the time series tables.

Update: we newly added recovered time series table for most countries. Thanks!

paolinic03 commented 4 years ago

Woah, major news. Let’s do this. Bummed about no recovered but seems to be difficult to collect. County level data is going to be massive. Thank you

billyburgoa commented 4 years ago

Thanks for your work. I'd like to know why you won't report or provide recovered cases.

CSSEGISandData commented 4 years ago

No reliable data source reporting recovered cases for many countries, such as the US.

ryanwoconnor commented 4 years ago

Can you please provide us a date/time for that cutover? Can we place these new files into a different folder and leave the old files in place? This way current dashboards that we may have running won't be full of errors when the cutover happens?

Thank you, Ryan

DMiradakis commented 4 years ago

Thanks so much! I'm making a Power BI Report now, so it's good to know about these upcoming changes!

bevanward commented 4 years ago

Thanks @CSSEGISandData With respect to your second bullet point, will Province/State remain for countries (excluding US) where you can source the data?

Changes look good - thanks for all the hard work - this is a very important data set! Bevan

christophGeoHealthCentre commented 4 years ago

How do you count actice cases without having recovered available?

paolinic03 commented 4 years ago

You don’t, just confirmed, deaths, and testing.

DMiradakis commented 4 years ago

How do you count actice cases without having recovered available?

I'm just grouping the difference together into a group called "Active or Recovered". Like @paolinic03 said , it's the best we can do for the moment.

analyzewithpower commented 4 years ago

THANK YOU!!! :)

shahesam84 commented 4 years ago

Will there be a release for those mentioned tables today? I don't see US tables yet.

aatishb commented 4 years ago

Thank you for this. This really is an amazing resource, and I'm excited for these changes. I recommend pinning this issue so that folks don't miss it. https://help.github.com/en/github/managing-your-work-on-github/pinning-an-issue-to-your-repository

piccolbo commented 4 years ago

US states data is being removed. Will the us states confirmed cases and deaths be obtained by aggregating over the counties? If so, are you going to provide the state as a separate column or as part of the county name? Thanks

analyzewithpower commented 4 years ago

US states data is being removed. Will the us states confirmed cases and deaths be obtained by aggregating over the counties? If so, are you going to provide the state as a separate column or as part of the county name? Thanks

You can parse state from the county name with some quick data transformation...

martiL commented 4 years ago

In the course of the hackathon "https://wirvsvirushackathon.org/ " we are implementing a RESTful Webservice which scrapes data from various places. We have developed a landing page to make it easier for many interested people to implement a scraper or/and use our API

Maybe you can help to make the API stable and reusable! So we have to fix breaking changes only once ;-)

check it out: https://corona-api-landingpage.netlify.com/

shahesam84 commented 4 years ago

In the course of the hackathon "https://wirvsvirushackathon.org/ " we are implementing a RESTful Webservice which scrapes data from various places. We have developed a landing page to make it easier for many interested people to implement a scraper or/and use our API

Maybe you can help to make the API stable and reusable! So we have to fix breaking changes only once ;-)

check it out: https://corona-api-landingpage.netlify.com/

Thank for this. This API's output is 2 days before. I tried this "https://corona.ndo.dev/api/daily" and it showed results until March 20th. It is good as backup.

kanungle commented 4 years ago

Thanks for giving us a heads up so we may prepare for the changes. And thank you for all the work you're doing!

pomber commented 4 years ago

Can you please provide us a date/time for that cutover? Can we place these new files into a different folder and leave the old files in place? This way current dashboards that we may have running won't be full of errors when the cutover happens?

This ☝️. Please @CSSEGISandData, help us minimize the breaking changes.

advaithasabnis commented 4 years ago

The ISO code will be added in the global time series tables.

I think this is also to be added yet so the format of the global time series will change yet again.

piccolbo commented 4 years ago

Trying to answer my own question, this ticket mentions FIPS code will be added. Those could be county codes or state codes, I hope both, but in either case should support aggregation without unsavory brittle regex tricks.

rtroha commented 4 years ago

Are you planning to include Canadian provincial data in one of the data sets?

Thanks

jipiboily commented 4 years ago

@rtroha it's already in there, no? Working on email reports right now, using worldwide data, but also Canadian data by provinces (I'm in Canada).

rtroha commented 4 years ago

@jipiboily It's there now, but when the format changes they said they're getting rid of state data, so if that's true i'm not sure how they're going to handle Canada (I'm in the US but we have reporting needs for Canada as well).

advaithasabnis commented 4 years ago

@rtroha

Changes to the current time series include the removal of the US state and county-level entries, which will be replaced with a new single country level entry for the US.

I'm assuming getting rid of only US state data. Canada provincial data is still there in the 'global' file as of now at least.

tautme commented 4 years ago

@CSSEGISandData Thanks for continuing to make better and being transparent. I look forward to having the counts at the county level in the US.

louchios commented 4 years ago

The Living Atlas US Cases feature layer is listed as deprecated, but I believe it is still being updated and now has county level data. US Feature Layer

yetzt commented 4 years ago

Instead of changing only the time series, you broke the consitency of the data format in the daily reports. #1326

alexchandel commented 4 years ago

@CSSEGISandData please KEEP and UPDATE a recovered cases file. As you've consolidated all states into one "US" row, your argument no longer applies. Unbreak your repository.

bevanward commented 4 years ago

@CSSEGISandData keen to understand where the change process is at?

It does not seem that the changes planned are what have been made here and as others are mentioning seems less clear as to how to make a clean data set for fine grained time series data which is what we all want.

If we want a fine grained data set do we have to mash the daily with time series - replacing the country+county/state, etc in the time series each day?

It does not seem to make sense the way this repository is progressing.

Could you please explain to us how to compile this data based on the current change and future planned.

Thanks

yystat commented 4 years ago

It does NOT make any sense to me why removed US states but still keep state-level entries in other countries (e.g., provinces in China, Canada, etc) in the "time_series_covid19_confirmed_global.csv".

advaithasabnis commented 4 years ago

@yystat Separate US time-series with county level details is coming...

Not sure why it's not there yet before deprecation of old one starts. Ideally there should be some overlap.

yystat commented 4 years ago

@yystat Separate US time-series with county level details is coming...

Not sure why it's not there yet before deprecation of old one starts. Ideally there should be some overlap.

I'm not sure why they want to keep US data in a separate file. Previously I only need to download one file, and if I want to focus on US, then I only select region==US. Now I have to deal with 2 separate files.

arik-so commented 4 years ago

What time are you planning on uploading the US time series files such as time_series_covid19_confirmed_US.csv?

cortical-iv commented 4 years ago

@yystat Separate US time-series with county level details is coming... Not sure why it's not there yet before deprecation of old one starts. Ideally there should be some overlap.

I'm not sure why they want to keep US data in a separate file. Previously I only need to download one file, and if I want to focus on US, then I only select region==US. Now I have to deal with 2 separate files.

This is a deal breaker for me. I can't believe they did this in the middle of the pandemic. I built something in a code sprint in my spare time that worked great and now it is broken and I do not have time to fix this. Please put back the US states!

And they put this in issues, not on the front page of the repo, or the readme pages for the different csv folders. I mean...my site just got up and running on Saturday and now it's basically useless.

kingwatam commented 4 years ago

No reliable data source reporting recovered cases for many countries, such as the US.

Just because US doesn't report recovery, does that mean the rest of the world must follow?

I really would hope the team would reconsider this, as I'm sure a lot of other folks around the world would appreciate this as well. There's no reliable recovery data for some countries (namely US), but most countries provide this valuable information. Are you also going to remove this from the interactive map too?

piccolbo commented 4 years ago

They said "many countries". As big as it is, the US is just one.

regattaguru commented 4 years ago

We all know that data out of the US is scarcely reliable due to their lack of a centralised system of reporting, so why deprive us of more reliable data from the rest of the world that has coordinated health care reporting? If this data set is now just going to be tailored to the needs of the US administration, then it is not reliable for the rest of the world.

advaithasabnis commented 4 years ago

So I see the deprecation notice in the readme and asking everyone to use the new "global" data but where is this:

The ISO code will be added in the global time series tables.

When that's added later, things will break again...

clyde7 commented 4 years ago

For US data, will longitude/latitude coordinates still be provided, or do I have to find a way to map FIPS to longitude/latitude?

ebwinters commented 4 years ago

Please keep state level data for the US. Provinces for other countries are still reported, not sure why US wouldn't be, especially considering JHU is in the US 😆

mkosunen commented 4 years ago

The number of recovered cases is VERY important data as with exponential growth like this the measure to look for is the growth RELATIVE to number of acrive cases. Without that knowledge, the data is I would say useless.

mspandit commented 4 years ago

It's not acceptable to publish some of the data in different formats in the same commit. For example, cse_covid_19_daily_reports/03-23-2020.csv is a different format from cse_covid_19_daily_reports/03-22-2020.csv which is a different format from cse_covid_19_daily_reports/02-01-2020.csv.

I can understand the schema changing over time (e.g. recovered counts). If that happens, then

  1. Ensure all the data in the previous commit is using the previous schema.
  2. Ensure all the data in the current commit is using the new schema.

That way, any code that reads the data can be consistent for each version of your repo, instead of being different for various date ranges. (And if you don't expect code to read your data, then why are you publishing it?)

Even better, publish all the data in all versions of the schema in every commit. That way the code that reads the data can remain consistent across multiple versions of your repo.

SimonVillage commented 4 years ago

It's not acceptable to publish some of the data in different formats in the same commit. For example, cse_covid_19_daily_reports/03-23-2020.csv is a different format from cse_covid_19_daily_reports/03-22-2020.csv which is a different format from cse_covid_19_daily_reports/02-01-2020.csv.

I can understand the schema changing over time (e.g. recovered counts). If that happens, then

  1. Ensure all the data in the previous commit is using the previous schema.
  2. Ensure all the data in the current commit is using the new schema.

That way, any code that reads the data can be consistent for each version of your repo, instead of being different for various date ranges. (And if you don't expect code to read your data, then why are you publishing it?)

Even better, publish all the data in all versions of the schema in every commit. That way the code that reads the data can remain consistent across multiple versions of your repo.

I can't believe that the data is provided by Johns Hopkins :D

Wikunia commented 4 years ago

Why do you push the global data already if the US data is not published? Thousands depend on this data set and you just change it the way you like from day to day without any consistency. Please be more thoughtful. This is amazing data but it's worthless if people need hours every week to rework their scripts.

Additionally in 2 weeks this data is completely useless without data for recovered. It's already useless in Chian i.e in Hubei, China there are about 60,000 recovered cases.

micder commented 4 years ago

The number of recovered cases is VERY important data as with exponential growth like this the measure to look for is the growth RELATIVE to number of acrive cases. Without that knowledge, the data is I would say useless.

I agree with you. Without the recovered cases it's not possible to estimate the curve of the actual infected and this is a great lack for the analysis. Not possible to make predictions.....

thefilmmaking commented 4 years ago

I can understand the frustration, as another person who also keeps having to tweak code to adapt to these changes, but I'm shocked by all the complaining. The fact that we even have access to this data that they are putting days and nights of effort into gathering is an absolute gift. Just being provided access to their hard work is wonderful. Let's practice a little gratitude during these times. 🙏

GregFrei commented 4 years ago

No reliable data source reporting recovered cases for many countries, such as the US.

Wouldn't it be possible to provide recoveries for the countries with reliable sources and keep the rest NA ?

nickjevershed commented 4 years ago

I've removed recovered cases from our feeds as per these changes, but I'm curious as to why recovered cases are still showing on the ArcGIS dashboard? Many thanks for all your hard work.