Open juancalvof opened 4 years ago
Hi @JuanCalvoFerrandiz, thanks again for your bug report.
Do you mind detailing with data source returns this values?
Thanks!
I hope this data helps :)
This has already been reported to CoronaDataScraper : https://github.com/covidatlas/coronadatascraper/issues/528
During the afternoon I will try to find more cases using the data you provided and send them also the visualization you made in case it may help them.
Hi guys,
This is my exploration code for Viz fixing: Agregation, lat,long, adding ISO 3 and adding an official name column. Hope that helps:
import task_geo.data_sources as ds
import pandas as pd
# A function that returns de unique values of a column id a df sorted
def series_unique(df, column):
unique_country_base = df.loc[:, column].unique()
return pd.DataFrame(data=unique_country_base,
columns=["unique_" + column]).sort_values("unique_" + column, ignore_index=True)
# A function that creates a dictionary from a values in a column of df_carto
def create_dict(column):
dict = {}
for value in df_unique_country_cl.loc[:, "unique_country"]:
value_dict = df_carto.loc[df_carto['country'] == value, column].iloc[0]
dict[value] = value_dict
return dict
# 0_Correction of aggregate values in countries
data_cds = ds.cds()
data_cds.loc[(data_cds["state"].isnull()) & (data_cds["county"].isnull()) & (data_cds["city"].isnull()), "aggregate"]\
= "country"
# Getting unique values from country column
data_cds_country_raw = data_cds.loc[(data_cds["aggregate"] == "country")]
df_unique_country = series_unique(data_cds_country_raw, "country")
#Getting df_carto
df_carto = pd.read_csv("..\DATA\RAW\Countries data\world_borders.csv", sep=",")
df_carto.rename(columns={"name": "country"}, inplace=True)
# 1_Getting country_carto column
# Getting unique values from country column
df_unique_country_cl = series_unique(df_carto, "country")
# Getting values with no direct equivalence in df_carto
df_left = df_unique_country.merge(df_unique_country_cl, how='outer', indicator=True).loc[
lambda x: x['_merge'] == 'left_only']
list = df_left.loc[:, "unique_country"]
list2 = ["Brunei Darussalam", "Congo", "Czech Republic", "Cote d'Ivoire", "Timor-Leste", "Swaziland",
"Iran (Islamic Republic of)", "Kosovo", "Lao People's Democratic Republic", "Libyan Arab Jamahiriya",
"Republic of Moldova", "Burma", "The former Yugoslav Republic of Macedonia", "Palestine",
"Western Sahara", "Korea, Democratic People's Republic of", "South Sudan", "Syrian Arab Republic",
"Sao Tome and Principe", "United Republic of Tanzania", "Bahamas", "Gambia", "Holy See (Vatican City)",
"Viet Nam"]
# Create a zip object from two lists and then a dict
dict = dict(zip(list, list2))
data_cds.insert(4, "country_carto", data_cds.loc[:, "country"].map(dict).fillna(data_cds.loc[:, "country"]))
# 2_Getting iso
dict_iso = create_dict("iso3")
dict_iso["Kosovo"] = "RKS"
dict_iso["South Sudan"] = "SSD"
data_cds.insert(5, "iso3", data_cds.loc[:, "country_carto"].map(dict_iso))
# Data_cds_country
data_cds_country = data_cds.loc[(data_cds["aggregate"] == "country")]
# 3_Getting lat just in countries
dict_lat = create_dict("lat")
dict_lat["Kosovo"] = 42.667542
dict_lat["South Sudan"] = 6.8769908
data_cds_country['lat'] = data_cds_country.loc[:, "country_carto"].map(dict_lat)
# 4_Getting long just in countries
dict_long = create_dict("lon")
dict_long["Kosovo"] = 21.166191
dict_long["South Sudan"] = 31.3069782
data_cds_country['long'] = data_cds_country["country_carto"].map(dict_long)
data_cds_country.to_csv(r"C:\Users\juanc\Google Drive\CORONAWHY\DATASETS\data_cds_countries.csv", encoding="UTF-8")
[world_borders.zip](https://github.com/CoronaWhy/task-geo/files/4456383/world_borders.zip)
While reading the docs I came to the realization that the values of the field aggregation
are completely correct, the thing is that we should be looking at the level
field. More info
Will upload this along the adding of the iso codes.
Update from CDS team:
@ManuelAlvarezC @JuanCalvoFerrandiz we are soon migrating to totally different coordinates, calculated in country-levels. https://github.com/hyperknot/country-levels
Please review if this issue is still present in a few days.
Source: https://github.com/covidatlas/coronadatascraper/issues/528#issuecomment-611988126
Commit SHA: commit c4af4d71d781ac9fd204713b41cd79650519497c (HEAD -> master, origin/master, origin/HEAD) Merge: f23824d 3de7d90 Author: Manuel Alvarez Campo manuel@pythiac.com Date: Sat Apr 4 13:48:16 2020 +0200
Merge pull request #35 from shaikh-raj/master
Adding metadata for CDS datasource
Description
Please, revise lat, long values. There are some countries that are wrong. I have made this Viz for helping to visualize the situation. Clic dot to see country, lat, long values: https://juancalvo.carto.com/builder/3ad41c17-bc07-4889-b047-5903300806c4/embed