covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 93 forks source link

Many countries do not have data when level = 3 or 2, but apparently google indicators are available for them #191

Closed umbe1987 closed 2 years ago

umbe1987 commented 2 years ago

In Romania, for example, this code results in a data frame with 0 rows (both level 2 and 3):

google <-
  covid19(
    gmr = TRUE,
    country = "Romania",
    level = 3,
    start = "2020-01-01",
    end = Sys.Date(),
    # current date
    dir = "data"
  )

But apparently, google mobility indicators are available at level 3 for Romania (at least from what I can tell downloading the CSV and looking at the NUTS3 names).

Example for Tulcea County (NUTS3 code RO025)

image

I could not check everything, but this could be the same for the following countries (giving me a dataframe with 0 rows, at least with level = 3):

countries <-
  c(
    "Estonia",
    "Finland",
    "Croatia",
    "Hungary",
    "Luxembourg",
    "Malta",
    "Poland",
    "Portugal",
    "Sweden",
    "Slovenia",
    "Slovakia"
  )

google <-
    covid19(
        gmr = TRUE,
        country = countries,
        level = 3,
        start = "2020-01-01",
        end = Sys.Date(),
        # current date
        dir = "data"
    )
eguidotti commented 2 years ago

Hi @umbe1987, thanks for checking this out.

It seems to be OK. The argument gmr (and amr or wb) are meant to merge the epidemiological data with google mobility. If the epidemiological data are unavailable, no data is merged. That's why an empty data frame is returned.

If you are interested in the google mobility only, you can easily import the full dataset with:

library(data.table)
x <- fread("https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv")

Hope this helps!

umbe1987 commented 2 years ago

Thank you again for your feedback!

As you correctly said, I am only interested in the google mobility data, but it was nice to have NUTS codes for EU countries in the dataset. Do you know if this is possible for the aforementioned countries with this library?

siacus commented 2 years ago

Hi Emanuele, as far as I remember, you were able to match the identifiers of Google locations with NUTS and other labelling systems (e.g., GID). Would it be possible to return a data set with only Google data and empty epidemiological data anyway ? It would be of great help without the need to rewrite the matching code. Or do you know of any other alternative solutions to go from Google identifier to NUTS codes ?

eguidotti commented 2 years ago

@siacus and @umbe1987

Returning a data set with only Google data and empty epidemiological data is not possible. The reason is that this repo is matching the locations associated with some epidemiological data with Google mobility and other identifiers. Not vice-versa.

That is, first I collect the epidemiological data, then I associate the geographical entities with google mobility and other identifiers. In other words, this repo contains only a subset of google mobility or other databases (i.e., their intersection with the epidemiological locations currently supported).

The mapping between various systems are coded here:

You can build the complete map to match the data as follows. The object geoMap contains information for more than 13,000 locations for which the epidemiological data are implemented.

library(jsonlite)
library(data.table)

repo <- "covid19datahub/COVID19"
endpoint <- sprintf("https://api.github.com/repos/%s/git/trees/master?recursive=1", repo)
filesInRepo <- fromJSON(endpoint)
filesPaths <- filesInRepo$tree$path
geoPaths <- filesPaths[startsWith(filesPaths, "inst/extdata/db/")]
subNationalPaths <- geoPaths[geoPaths!="inst/extdata/db/ISO.csv"]

subNationalData <- lapply(subNationalPaths, function(path){
  url <- sprintf("https://raw.githubusercontent.com/covid19datahub/COVID19/master/%s", path)
  data <- fread(url, na.strings = "", encoding = "UTF-8")
  data$iso_alpha_3 <- gsub("^inst/extdata/db/([A-Z]{3})\\.csv$", "\\1", path)
  data[!is.na(data$id),]
})
subNationalData <- rbindlist(subNationalData, fill = TRUE)
subNationalData <- subNationalData[,c(
  "iso_alpha_3",
  "administrative_area_level_2", 
  "administrative_area_level_3",
  "administrative_area_level",
  "key_local",
  "key_nuts",
  "key_gadm",
  "key_hasc",
  "key_jhu_csse",
  "key_apple_mobility",
  "key_google_mobility",
  "latitude",
  "longitude",
  "population")]

nationalData <- fread("https://raw.githubusercontent.com/covid19datahub/COVID19/master/inst/extdata/db/ISO.csv", na.strings = "", encoding = "UTF-8")
nationalData <- nationalData[,c("iso_alpha_3", "administrative_area_level_1")]

geoMap <- nationalData[subNationalData, on = "iso_alpha_3"]
geoMap

Another way to associate NUTS with Google mobility is to use google_nuts_matchtable from the package regions. This contains NUTS 2016 instead of 2021 as far as I can see.

regions::google_nuts_matchtable
siacus commented 2 years ago

thanks a lot, very useful!

eguidotti commented 2 years ago

Happy to help! I'm now closing this issue. Fell free to open another one if anything else is needed. Cheers