CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.16k stars 18.46k forks source link

This needs to stop!!!!! #1280

Open HappyIRL opened 4 years ago

HappyIRL commented 4 years ago

Hey,

please STOP changing format every day. I am breaking this file to strings and indexes. If you change the format every day, it is useless for anyone who wants to use this. If at one day the index for date is 4 and next time it is 0?!

cipriancraciun commented 4 years ago

I have just opened an issue where I describe a derived dataset (in a more SQL friendly format) that I have created based on the JHU original data: https://github.com/CSSEGISandData/COVID-19/issues/1281

skr47ch commented 4 years ago

Have a look at the readme before complaining. (I've bolded the part for you) These guys are doing this as a service, not because they are required to. Do not blame others for your lack of flexibility. Me and many others are using this, and we change our models as required.

Look at other posts for work around and cleaned up data.

Terms of Use:

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

martiL commented 4 years ago

In the course of the hackathon "https://wirvsvirushackathon.org/ " we are implementing a RESTful which scrapes data from various places. We have developed a landing page to make it easier for many interested people to implement a scraper or/and use our API

Maybe you can help to make the API stable and reusable! So we only have to fix the broken changes at one place with united forces.

check it out: https://corona-api-landingpage.netlify.com/

HappyIRL commented 4 years ago

Have a look at the readme before complaining. (I've bolded the part for you) These guys are doing this as a service, not because they are required to. Do not blame others for your lack of flexibility. Me and many others are using this, and we change our models as required.

Look at other posts for work around and cleaned up data.

Terms of Use:

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

I get what you are saying. But a database that changes its format day to day is just useless spam. I am pleased by the work they do, I was just angry that my code was completely messed up, that's why the bad tone in words, Sorry :).

jeremiahjp commented 4 years ago

Clearly they are working on making it consistent with less errors. The fact that they gather nearly all data possible into one simple csv is already quite remarkable. If something changes, then adapt to the changes.

cipriancraciun commented 4 years ago

@HappyIRL In order to help the JHU team (which I assume are already swamped by work and issues), and to keep thing tidy, would you consider closing this issue and perhaps following #1250 which seems to be the ticket that collected all issues on this topic.

ghost commented 4 years ago

I echo @HappyIRL 100%. Yes, they are performing an extremely important service, no question, but that warrants adherence to data 101.

And understanding they are likely swamped, there's an army of potential contributors here to assist, establish a simple format, and even transform historical data (which is critically important) into the standard format (which, if done correctly with sufficient thought, like any standard becomes relatively immutable). We're here to help and analyze, not workaround breaking changes and rewrite transformation code daily. Create branches and within no time we'd have data that's both consistent and easily ingestible into your weapon of choice. And for those unable to procure compute in these difficult times, I'm willing provide it.

My apologies too for the tone, but I'm frustrated.

And please note that this does not bear on the quality of the data itself. My frustration is solely about the general format (e.g., echoing a comment on #1250 and other issues, it should be rotated) and ever-changing format.

Thank you, thank you JHU, and please everyone stay well.

cipriancraciun commented 4 years ago

At the moment the data format seems pretty stable (with all its quirks), and for two consecutive days it hasn't produced a single parsing exception.

Therefore "fixing" it at this time, would actually "break" everyone's scripts once more. Unfortunately this "legacy" format now has become "the standard" at least for the JHU dataset.