Provide a combined version of the daily report

CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE

https://systems.jhu.edu/research/public-health/ncov/

29.13k stars 18.43k forks source link

Provide a combined version of the daily report #1701

Open salmasian opened 4 years ago

salmasian commented 4 years ago

As trivial as it is to load all files in /csse_covid_19_data/csse_covid_19_daily_reports/ and paste them together into one long time series file, it would be nice if that folder already contained a file called combined.csv which is the result of concatenating all other CSV files; obviously this would need to be updated daily.

This is particularly of interest because due to #1250 the time series data does not contain province/state-level data any more.

cipriancraciun commented 4 years ago

Given that there are already opened several issues on the topic of data summarization, like #1681, would you consider closing this ticket and following the other one. (This would help the JHU team, as they are already swamped in tickets.)

That being said, I have built a derived dataset based on JHU dataset, where (among other fixes and features) I also aggregate the data for China, US and all the other countries, and merge everything in a single dataset for easy usage:

https://github.com/cipriancraciun/covid19-datasets
I have described it in this issue #1281 or the readme in the repo above;

(I try as much as possible not to change the format too much, mainly I only add new rows and columns.)

salmasian commented 4 years ago

@cipriancraciun sure, please feel free to close it and mark it as duplicate for the closest related issue.

As for your own repository: keeping the original CSV format would be a good idea.

cipriancraciun commented 4 years ago

@salmasian given I'm not an owner of this repository I can't close this issue; however you as the original poster can close it.

Regarding my repository, the original format is almost unusable with "regular tools" (being a pivot table), therefore my derived format uses a "SQL table" approach.