Closed yetzt closed 4 years ago
Are they going to be changing this back soon??
The removal of aggregated US states is really frustrating me. Naively, it's easy enough to just group-by state and sum cases over all of its cities, but there is nothing guaranteeing that all of a state's cities are included in the data. I'd have more confidence if all three levels of aggregation were provided in the raw data: city level, state level, and country level case numbers.
Wow - this totally broke an ETL package that took 12+ hours to write. Talk about a weird move.
i dont agree with the op: sometimes its unavoidable to change things so that "things will break downstream". i would argue the other way around: write things so that they dont break in such cases.
@chrisdane It's rarely unavoidable to suddenly break things. Whoever produces high stakes data should use deprecation procedured to avoid scenarios like this. In this cade it would have been simple to put the new format in a separate path and produce the current format from the new data. If for example the US would suddenly use metric instead of imperial measures, loads of things would break, so they don't or use the alternative system in parallel.
yet another surprise: the filenames for the timeline files were changed. this repository is not trustworthy. we can't rely on the stability of fieldnames or filenames, identifiers, date formats, ...
really dont understand your problem @yetzt.
i can use the time_series*.csv
files without any problems: https://github.com/chrisdane/COVID-19/tree/mybranch/r_plots#Germany
the renaming of these files was even pinned on their issue board. you make an elephant aus ner mücke i feel =/
its even in their readme:
The Johns Hopkins University hereby disclaims any and all
representations and warranties with respect to the Website,
including accuracy, fitness for use, and merchantability.
Reliance on the Website for medical guidance or
use of the Website in commerce is strictly prohibited.
@chrisdane scrapers don't read pinned issues and the changes weren't even menioned in #1250
read it carefully. nothing about removing files, nothing about changing field identifiers, all changes in relation to the time series and not the daily reports. and you ask what the stakes are: the most regarded data visualisation on corona in germany went down or displayed erronous data as a result of these issues. not once, but multiple times. not a good thing to happen in a time of crisis where information is cruicial.
as i mentioned in #1615 i've migrated my code away from this repository, since it does not adhere to any quality standards or best practices on open data. when we get past this, i will share my expierience with the broader data journalism community.
Dear whoever is at fault for this,
Please do not, never, under any circumstances rename or otherwise change the identifiers in your data. Especially not as a surprise. Especially especially when so many people rely on data to be available.
People are writing software, relying and depending on the stability of APIs and data structures. You just annoyed and frustrated many people.
Please consider this plea
(And while we're talking: removing US States and replacing them with municipal data isn't considered very nice as well. Expand your data, don't replace it. Create separate files. Open Data 101: Before you do things, ask yourself will this break things downstream? )