CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

Country Names don't follow WHO conventions, and please add WHO country code #1390

Open JetForMe opened 4 years ago

JetForMe commented 4 years ago

For example, in the time series global data, the United States is listed as "US". This makes it hard to correlate with other tables that use the official WHO designation of "United States."

To go a step further, could you add the 3-letter WHO code for each country or region? Here's a list of them: https://ghoapi.azureedge.net/api/DIMENSION/COUNTRY/DimensionValues

Thanks!

bsdunx commented 4 years ago

Another option to consider is using ISO standardized country codes as these are an internationally ratified standard.

cipriancraciun commented 4 years ago

I have created a derived dataset, available in JSON and TSV (in a more SQL friendly format), where I tried to solve some of these issues (like normalizing country names, providing dates in ISO format, etc.), plus I tried to compute simple metrics, and augment it with additional information (like continent, sub-continent, country totals, day index since N number of confirmed cases, etc.):

joachim-gassen commented 4 years ago

In case you are interested: Here I present code to use fuzzy matching with manual correction to match ISO3c country codes (ISO 3166-1 alpha-3) on the repo's time series data. I agree that it would be good if established country identifiers would be included with the data.

rmunjuluri commented 4 years ago

Here is another link that you may be interested in with ISO3 Codes for Country, Continent, Continental_SubRegion and Intermediary_Regions

https://github.com/rmunjuluri/JHU-COVID-19-Timeseries-with-cleanup

JetForMe commented 4 years ago

I can't find good descriptions of the WHO data, but they do use 3-letter codes for countries and regions (although regions like AMR have a sub-goupings A, B, etc.: https://apps.who.int/gho/data/node.metadata.REGION). I can't tell if their codes match ISO3 codes.

payamazadi commented 4 years ago

Bump. It makes it really hard to use this data with literally anything when the country name is this way. This seems easy to improve, but since you are only providing raw data, it seems hard for people to provide a PR to improve this..

payamazadi commented 4 years ago

you can use this API as a translation layer but this is really messy man https://restcountries.eu/#api-endpoints-name

payamazadi commented 4 years ago

wait, looks like they may have addressed this here: https://github.com/CSSEGISandData/COVID-19/issues/1791