ExpDev07 / coronavirus-tracker-api

🦠 A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak. It's written in python using the 🔥 FastAPI framework. Supports multiple sources!
https://coronavirus-tracker-api.herokuapp.com
GNU General Public License v3.0
1.59k stars 323 forks source link

Population Numbers Inaccurate #262

Closed ccedacero closed 4 years ago

ccedacero commented 4 years ago

country_population data is using old data. From what I can tell, It may be using data from the last Census nearly 10 years ago. I may have missed documentation noting that, but just making sure it wasn't missed. Thanks for the API! it's great!

toxyl commented 4 years ago

I noticed that as well, I scraped this list from Wiki:

{
  "China": 1401754280,
  "India": 1359772087,
  "United States": 329448153,
  "Indonesia": 266911900,
  "Brazil": 211252866,
  "Pakistan": 218939520,
  "Nigeria": 206139587,
  "Bangladesh": 168265026,
  "Russia": 146745098,
  "Mexico": 126577691,
  "Japan": 126010000,
  "Philippines": 108402887,
  "Egypt": 100127124,
  "Ethiopia": 98665000,
  "Vietnam": 96208984,
  "DR Congo": 89561404,
  "Iran": 83279228,
  "Turkey": 83154997,
  "Germany": 83149300,
  "France": 67064000,
  "Thailand": 66481242,
  "United Kingdom": 66435600,
  "Italy": 60243406,
  "South Africa": 58775022,
  "Tanzania": 55890747,
  "Myanmar": 54339766,
  "Korea, South": 51780579,
  "Colombia": 49395678,
  "Kenya": 47564296,
  "Spain": 47100396,
  "Argentina": 44938712,
  "Algeria": 43000000,
  "Sudan": 42343075,
  "Ukraine": 41902416,
  "Uganda": 40299300,
  "Iraq": 39127900,
  "Poland": 38386000,
  "Canada": 37956869,
  "Morocco": 35838381,
  "Saudi Arabia": 34218169,
  "Uzbekistan": 34068416,
  "Malaysia": 32718760,
  "Afghanistan": 32225560,
  "Venezuela": 32219521,
  "Peru": 32131400,
  "Angola": 31127674,
  "Ghana": 30280811,
  "Mozambique": 30066648,
  "Yemen": 29825968,
  "Nepal": 29609623,
  "Cameroon": 26545864,
  "Ivory Coast": 25823071,
  "Madagascar": 25680342,
  "Australia": 25645795,
  "North Korea": 25450000,
  "Taiwan": 23604265,
  "Niger": 22314743,
  "Sri Lanka": 21803000,
  "Burkina Faso": 20870060,
  "Mali": 19973000,
  "Romania": 19405156,
  "Chile": 19107216,
  "Kazakhstan": 18662768,
  "Malawi": 17563749,
  "Syria": 17500657,
  "Netherlands": 17444381,
  "Ecuador": 17443880,
  "Zambia": 17381168,
  "Guatemala": 16604026,
  "Senegal": 16209125,
  "Somalia": 15893219,
  "Chad": 15692969,
  "Cambodia": 15288489,
  "Zimbabwe": 15159624,
  "South Sudan": 12778250,
  "Rwanda": 12374397,
  "Guinea": 12218357,
  "Benin": 11733059,
  "Tunisia": 11722038,
  "Haiti": 11577779,
  "Belgium": 11524454,
  "Bolivia": 11469896,
  "Cuba": 11209628,
  "Burundi": 10953317,
  "Greece": 10724599,
  "Czechia": 10681161,
  "Jordan": 10635640,
  "Dominican Republic": 10358320,
  "Sweden": 10333456,
  "Portugal": 10276617,
  "Azerbaijan": 10067108,
  "United Arab Emirates": 9890400,
  "Hungary": 9772756,
  "Belarus": 9413446,
  "Israel": 9171450,
  "Honduras": 9158345,
  "Tajikistan": 9127000,
  "Papua New Guinea": 8935000,
  "Austria": 8902600,
  "Switzerland": 8586550,
  "Sierra Leone": 7901454,
  "Togo": 7538000,
  "Hong Kong": 7500700,
  "Paraguay": 7152703,
  "Laos": 7123205,
  "Bulgaria": 7000039,
  "Serbia": 6963764,
  "Libya": 6871287,
  "Lebanon": 6825442,
  "Kyrgyzstan": 6523500,
  "El Salvador": 6486201,
  "Nicaragua": 6460411,
  "Turkmenistan": 6031187,
  "Denmark": 5822763,
  "Singapore": 5703600,
  "Finland": 5527573,
  "Congo (Kinshasa)": 5518092,
  "Central African Republic": 5496011,
  "Slovakia": 5456362,
  "Norway": 5367580,
  "Costa Rica": 5058007,
  "occupied Palestinian territory": 4976684,
  "New Zealand": 4970195,
  "Ireland": 4921500,
  "Oman": 4664790,
  "Liberia": 4475353,
  "Kuwait": 4420110,
  "Panama": 4218808,
  "Mauritania": 4077347,
  "Croatia": 4076246,
  "Georgia": 3723464,
  "Uruguay": 3518552,
  "Eritrea": 3497117,
  "Mongolia": 3307476,
  "Bosnia and Herzegovina": 3301000,
  "Puerto Rico": 3193694,
  "Armenia": 2957500,
  "Albania": 2862427,
  "Lithuania": 2793350,
  "Qatar": 2747282,
  "Jamaica": 2726667,
  "Moldova": 2681735,
  "Namibia": 2458936,
  "Gambia": 2347706,
  "Botswana": 2338851,
  "Gabon": 2172579,
  "Slovenia": 2094060,
  "North Macedonia": 2077132,
  "Lesotho": 2007201,
  "Latvia": 1906800,
  "Kosovo": 1795666,
  "Guinea-Bissau": 1604528,
  "Bahrain": 1543300,
  "East Timor": 1387149,
  "Trinidad and Tobago": 1363985,
  "Equatorial Guinea": 1358276,
  "Estonia": 1328360,
  "Mauritius": 1265985,
  "Eswatini": 1093238,
  "Djibouti": 1078373,
  "Fiji": 884887,
  "Cyprus": 875900,
  "Comoros": 873724,
  "Guyana": 782766,
  "Bhutan": 741672,
  "Solomon Islands": 680806,
  "Macau": 679600,
  "Montenegro": 622359,
  "Luxembourg": 613894,
  "Western Sahara": 582463,
  "Suriname": 581372,
  "Cape Verde": 550483,
  "Malta": 493559,
  "Transnistria": 469000,
  "Brunei": 442400,
  "Belize": 408487,
  "Bahamas": 385340,
  "Maldives": 374775,
  "Iceland": 364260,
  "Northern Cyprus": 351965,
  "Vanuatu": 304500,
  "Barbados": 287025,
  "New Caledonia": 282200,
  "French Polynesia": 275918,
  "Abkhazia": 244832,
  "São Tomé and Príncipe": 201784,
  "Samoa": 200874,
  "Saint Lucia": 178696,
  "Guam": 172400,
  "Curaçao": 158665,
  "lag of Artsakh.svg Artsakh": 148000,
  "Kiribati": 120100,
  "Aruba": 112309,
  "Grenada": 112003,
  "Saint Vincent and the Grenadines": 110608,
  "Jersey": 106800,
  "U.S. Virgin Islands": 104578,
  "F.S. Micronesia": 104468,
  "Tonga": 100651,
  "Seychelles": 97625,
  "Antigua and Barbuda": 96453,
  "Isle of Man": 83314,
  "Andorra": 77543,
  "Dominica": 71808,
  "Cayman Islands": 65813,
  "Bermuda": 64027,
  "Guernsey": 62792,
  "American Samoa": 56700,
  "Greenland": 56081,
  "Northern Mariana Islands": 56200,
  "Marshall Islands": 55500,
  "South Ossetia": 53532,
  "Saint Kitts and Nevis": 52823,
  "Faroe Islands": 52124,
  "Turks and Caicos Islands": 41369,
  "Sint Maarten": 40614,
  "Liechtenstein": 38557,
  "Monaco": 38300,
  "Saint Martin": 35746,
  "Gibraltar": 33701,
  "San Marino": 33574,
  "British Virgin Islands": 30030,
  "Ã…land Islands": 29885,
  "Palau": 17900,
  "Cook Islands": 15200,
  "Anguilla": 14869,
  "Wallis and Futuna": 11700,
  "Nauru": 11000,
  "Tuvalu": 10200,
  "Saint Barthélemy": 9793,
  "Saint Pierre and Miquelon": 6008,
  "Saint Helena Ascension and Tristan da Cunha": 5633,
  "Montserrat": 4989,
  "Falkland Islands": 3198,
  "Christmas Island": 1928,
  "Norfolk Island": 1756,
  "Niue": 1520,
  "Tokelau": 1400,
  "Vatican City": 799,
  "Cocos (Keeling) Islands": 538,
  "Pitcairn Islands": 50,
  "Martinique": 376480,
  "French Guiana":  290691,
  "Mayotte": 279471,
  "Republic of the Congo": 5244359,
  "Cote d'Ivoire": 23740424,
  "Reunion": 859959,
  "Guadeloupe": 395700,
  "The Bahamas": 385637
}

Not all countries had the same names, so I mapped them using this list:

{
    "US": "United States",
    "Taiwan*": "Taiwan",
    "Congo (Brazzaville)": "Republic of the Congo",
    "Congo (Kinshasa)": "DR Congo",
    "Gambia, The": "Gambia",
    "Bahamas, The": "Bahamas",
    "Timor-Leste": "East Timor",
    "Cabo Verde": "Cape Verde",
    "Holy See": "Vatican City"
}

HTH

Bost commented 4 years ago

@Toxyl we've been having quite an extensive mapping country name -> country code containing all sorts of alternative country names, aliases, synonyms, JHU CSEE misspellings etc. https://github.com/ExpDev07/coronavirus-tracker-api/blob/master/app/utils/countries.py

Kilo59 commented 4 years ago

@ccedacero @Toxyl I believe this is where we pull population data from. https://github.com/ExpDev07/coronavirus-tracker-api/blob/5b0197993fcb3eb573ae1dc9a94de9d93902fad7/app/utils/populations.py#L22-L27

toxyl commented 4 years ago

At least for the Netherlands Geonames seems to be way off as Wiki says Population: 17,424,978 (November 2019) (64th) but I don't know if that maybe is because the Caribbean territories are not taken into account by Geonames.

Problem is that the Wiki data might be just as flawed. Maybe it's an idea to add a parameter for that that source just like JHU vs CSBS? Like ?populationdata=wiki / ?populationdata=geonames

ccedacero commented 4 years ago

@Toxyl It looks like the wiki numbers are not far off. It looks like Google uses Eurostat, World Bank and United Nations depending on the country. . I guess we can scrape those? or we can run scrape google searches for all countries and collect the data that way. I think the google way would be easiest and reliable. Or we can stick with Wikipedia.

toxyl commented 4 years ago

I've made a new source with updated population data, here's the PR: https://github.com/ExpDev07/coronavirus-tracker-api/pull/274

My mirror already provides the new data if you want to test it.

ccedacero commented 4 years ago

great! Thank you!

toxyl commented 4 years ago

You're welcome :)