CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.12k stars 18.41k forks source link

Request: column for continent OR row with calculated number #862

Open 00lex opened 4 years ago

00lex commented 4 years ago

It would be great if you add at the end a column for continents or new rows with calculated numbers for each continent.

thank you

piccolbo commented 4 years ago

You can use rnaturalearth https://www.rdocumentation.org/packages/rnaturalearth/versions/0.1.0 with the lat long locations.

Bost commented 4 years ago

Some discussion is in progress, but we're not giving it any high prio at the moment. However that may change as more and more countries are getting hit.

My personal coding progress can be observed here. When I'm done I port it to python and discuss with the owner of this repo / web-service if it pays off to integrate it in that service.

EtienneCmb commented 4 years ago

@00lex if you're interested, I've a json file that group continent : continents.json

Bost commented 4 years ago

@00lex if you're interested, I've a json file that group continent : continents.json

@EtienneCmb Please add Kosovo

EtienneCmb commented 4 years ago

@Bost done

Bost commented 4 years ago

@EtienneCmb strictly speaking Turkey is a country on two continents Asia and Europe. According to which criteria have you assigned it to Europe. Same as Russia and maybe some others, too. Could you document. Please? (I'm gonna also issue a ticket in your repo). Thanks

Edit: I can't create a ticket repo, huh? Why that?

Bost commented 4 years ago

@EtienneCmb @00lex Also we need to think people on ships belonging to... eh, which continent?

00lex commented 4 years ago

i don't like the mentioned solutions. its faster if add a extra column my self and pick for each of the 273 lines the continent. for my script that's fine. I will provide a template and maybe the repo owner will use it. but I don't know when I can do that

Bost commented 4 years ago

@EtienneCmb Your continents.json doesn't use ISO 3166 for country names. That's a big problem. Nevermind I'm on it.

EtienneCmb commented 4 years ago

Hi @Bost,

@EtienneCmb strictly speaking Turkey is a country on two continents Asia and Europe. According to which criteria have you assigned it to Europe. Same as Russia and maybe some others, too. Could you document. Please? (I'm gonna also issue a ticket in your repo). Thanks

Actually, the grouping was not performed by me but instead by @brungio

@EtienneCmb Your continents.json doesn't use ISO 3166 for country names. That's a big problem. Nevermind I'm on it.

I think country names are following what's inside the time-series csv files so that we can apply automatic python rules for setting the continent

EtienneCmb commented 4 years ago

@00lex

Inserting the "Continents" column using the json file is straightforward with python :

with open("continents.json") as f:
    continents = json.load(f)
# replacing patterns
repl = {}
for cont, couns in continents.items():
    for coun in couns:
        repl[coun] = cont

# load the covid time-series
df = pd.read_csv("../csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
df['Continents'] = df['Country/Region'].replace(repl, regex=True)

@Bost

Edit: I can't create a ticket repo, huh? Why that?

I've absolutely no idea why, but the issue tab was deactivated ! Now it should works. Open an issue if you want and I can work on it with @brungio

brungio commented 4 years ago

Hi @Bost,

@EtienneCmb strictly speaking Turkey is a country on two continents Asia and Europe. According to which criteria have you assigned it to Europe. Same as Russia and maybe some others, too. Could you document. Please? (I'm gonna also issue a ticket in your repo). Thanks

Actually, the grouping was not performed by me but instead by @brungio

@EtienneCmb Your continents.json doesn't use ISO 3166 for country names. That's a big problem. Nevermind I'm on it.

I think country names are following what's inside the time-series csv files so that we can apply automatic python rules for setting the continent

I used Wikipedia's groupings. They need to be checked well. The names of countries are changed in the data repos sometimes, so the groupings themselves need to be checked every day somehow.

Bost commented 4 years ago

@brungio was it this one? https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_by_continent_(data_file)

brungio commented 4 years ago

Hi @Bost

Here is the list of references. As I said, someone needs to check this thoroughly. Johns Hopkins does not follow these labels all the time, and it changes labels from time to time.

https://en.wikipedia.org/wiki/Europe#List_of_states_and_territories https://en.wikipedia.org/wiki/North_America#Countries,_territories,_and_dependencies https://en.wikipedia.org/wiki/South_America#Countries_and_territories https://en.wikipedia.org/wiki/List_of_Oceanian_countries_by_area https://en.wikipedia.org/wiki/List_of_Asian_countries_by_area

The case of Turkey is a bit ambiguous because Turkey spans two continents (Europe and Asia). I think it makes more sense to attribute Turkey to Asia because the vast majority of its landmass is on that continent. Possibly also the majority of Turkey's population is also in Asia.