CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

Time Series Table #521

Closed CSSEGISandData closed 4 years ago

CSSEGISandData commented 4 years ago

Due to the renaming of many locations. We are working on adjusting the time series tables. Thank you for your patience.

lukesneeringer commented 4 years ago

Related to https://github.com/CSSEGISandData/COVID-19/issues/504#issuecomment-597878734, I think; noting here so it cross-references.

Thank you!

aminadibi commented 4 years ago

Guys we run codes that rely on your table and are being broken everyday. We appreciate your hard work but it's important to keep things consistent.

AnthonyEbert commented 4 years ago

Thank you for providing your data. I suggest the alpha-3 ISO-3166 codes because:

They're just three letters, it couldn't be less political. Some "countries" on this list are universally recognized as countries, and some are universally recognized as not-countries (e.g. Christmas Island). This means that you can't guess my politics from TWN, IRN, or ISR - which is perfect.

For the purposes of the visualization app, you could use the US State Department list or whichever list you prefer.

pixelscript commented 4 years ago

I agree. The data doesn't even have to contain any country names they just need a consistent index. We are using the data to drive our visualisations and can map to our own names. If I want to name GBR "The land of tea and Hob Nobs" I'd be free to do so.

thomas101rcx commented 4 years ago

Can you please fix the country naming issue ? Constantly breaking everyone's code. Suggestion, perhaps merge all the counties to one state in the US at least. A lot of people's code are probably over counting it now.

MaBeuLux88 commented 4 years ago

If you could do this kind of test in a "develop" branch instead of hacking directly on the master branch, it would really be awesome. You broke my charts too, I had to use a backup: https://www.mongodb.com/blog/post/tracking-coronavirus-news-with-mongodb-charts Thanks again for providing this data though, it's really awesome.

MaBeuLux88 commented 4 years ago

image

Iran appears twice with a different name.

coronastats commented 4 years ago

Thank you. Will you fix the data in /csse_covid_19_data/csse_covid_19_daily_reports as well? My code is based off this file and it suffers from the same problem.

analyzewithpower commented 4 years ago

Thank you. On the university's dashboard, there are zero discrepancies as to county/state/country name. The data seems to be clean and consistent. Couldn't we get that same data?

I don't think the data that you use in the university's dashboard has these inconsistencies. Why can't you share that same data?

treerunner commented 4 years ago

US State data prior to 3/10 is incomplete. This is such a great effort, why not be consistent and clean with the data?

For example: 3/2/20 there is a confirmed case in New York County, NY. But the row for New York State has no data until 3/10.

The above scenario is not possible. Thank you very much for all your work.

lazd commented 4 years ago

Renaming is totally fine, but please provide the most precise data you can, and do not duplicate numbers for cities into their respective counties, or counties into their respective states, or states into their respective countries. If people need a summary for a given county/state/country, then can do so by geolocating whether the city is within the location they're attempting to get a summary of the data for.

AndroidDev77 commented 4 years ago

In the US you have removed county data and bunched it into state data. This only serves to lower it's value, as before you could see high outbreak locations now that data may be shifted 100's of miles away in the state center. For example Westchester County, NY has 122 Cases and New York City has 48. Accounting for 170 of New York States 212 Cases. This now shows up in the states center making it look like New York City has 0 cases.

rks125 commented 4 years ago

@AndroidDev77, yes..this is annoying. Over the past two days there has been movement away from providing county data in the US. I agree, data is losing it's value as it's critical to see the starting point (whatever your subjective opinion of a good value to start at, due to the significant lack of testing available in the US) and use that to compare it versus other geographic clusters. I hope this isn't politics at work and that someone is accidentally omitting.

analyzewithpower commented 4 years ago

@rks125 I am afraid it may be...It's not like the University doesn't have this data in a clean format. They use it in their dashboard and it's a university data research team, they know better about how 'small' changes impact reporting. I am afraid this goes deeper than someone omitting it. We have people relying on our independent reports and dashboards, these issues break the dashboards and people lose confidence over the timeliness of the reports as well as the accuracy. For example, I have measures to calculate new cases based on the prior report locations and count of confirms vs the current report, since the name of 'China' for example was changed, my report was showing THOUSANDS of new cases in China, which is inaccurate. If we are not proactive in catching these 'innocent' changes in this data, it is just a matter of time before our dashboards and reports are discredited, people stop trusting them, and turn back to the dashboards and counts provided by other sources!

terrillmoore commented 4 years ago

For doing studies of data, it's important to be able to pull data by country. Prior to 3/11, could sum the cases with US in column 2 and get the right total. But as of 3/11 there is double-counting -- "Washington" now appears with a subtotal for the individual counties. It's very hard to extract totals for US only, if that's what you need.

jgriffi6 commented 4 years ago

To reiterate above: 1) thanks for the hard work 2) consistent country names 3) no rollups w/i the dataset a) if you don't have state/province specific info then just add a the remaining as a sole entry e.g.
King County, WA .... Grant County, WA .... .... Other, WA .... (not total for WA)

then rollups would just work

Thanks!

joetynan11 commented 4 years ago

Everyone appreciates the time and effort taken but can you not just add one more column with "official/new name" instead of changing the naming convention that we have used up until now. Korea has changed 3 times so far,

Mainland China has 0 cases and China receives 80000 in one day.

CSSEGISandData commented 4 years ago

The time series tables have been updated.

AndroidDev77 commented 4 years ago

Ok, this is good. However There is double counting. NY State Shows 220, and It's Counties Add up to 220. User's could add logic account for this, but will this always be the case in US states?

lazd commented 4 years ago

@AndroidDev77 I am currently ignoring all state-level data in the US by checking if the Province/State equals a US state name. Unfortunately, this isn't sustainable (what if city data is mixed into county data, county data mixed into state date, state data mixed into country data?). We'll have to get clever about locating points within regions and checking the Province/State region to try to figure out what's going on.

lukesneeringer commented 4 years ago

United States data is still really broken / duplicative.

terrillmoore commented 4 years ago

It's easy enough to process. For columns on or after 3/11, for the US, keep a table of states (easy to find online). If the row matches a state name, that's the total (and ignore the things that don't match). For columns before 3/11, the rows with state names are all zero, and so can be included in the total. It was a five minute change for my display app.

Given how much work they've done, I don't mind having to patch my app a little on updates. I'm more interested in historical data than live display; their web interface is fine for live display. No point reinventing that wheel.

Great work, and thanks to the team for producing this database!

lazd commented 4 years ago

@terrillmoore yup, that's exactly what I'm doing. That said, it will become problematic if they offer data for a city that's within a county they have data for -- if that happens, you'll have to ignore all counties (a list is easily attainable for that as well) in favor of cities the same way we're ignoring all states in favor of counties and cities.

pixelscript commented 4 years ago

FYI I've got a little script here that'll turn the data into a consistent JSON set. Includes ISO 3166-1 alpha-3 and alpha-2 codes.

https://github.com/pixelscript/covid-19-data