Covid19R / covid19R

This R package provides access to a wide variety of data sources about Covid-19 in a standardized tidy data format.
GNU General Public License v3.0
12 stars 3 forks source link

covid19R

Lifecycle:
experimental CRAN
status Travis build
status

The goal of covid19R is to provide a single package that allows users to access all of the tidy covid-19 datasets collected by data packages that implement the covid19R tidy data standard. It provides access to multiple data sets that meet a tidy data standard.

To learn more abou the Covid19R project, check our extensive documentation about data standards, how to get your data added to this list, and more.

Installation

You can install the development version from github with:

remotes::install_github("covid19r/covid19r")

Getting the Data Information

To see what datasets are available, use get_covid19_data_info()

library(covid19R)

data_info <- get_covid19_data_info()

head(data_info) %>% knitr::kable()
data_set_name package_name function_to_get_data data_details data_url license_url data_types location_types spatial_extent has_geospatial_info get_info_passing refresh_status last_refresh_update
covid19nytimes_states covid19nytimes refresh_covid19nytimes_states Open Source data from the New York Times on distribution of confirmed Covid-19 cases and deaths in the US States. For more, see https://www.nytimes.com/article/coronavirus-county-data-us.html or the readme at https://github.com/nytimes/covid-19-data. https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv https://github.com/nytimes/covid-19-data/blob/master/LICENSE cases_total, deaths_total state country FALSE TRUE Passed 2020-05-04 16:08:36
covid19nytimes_counties covid19nytimes refresh_covid19nytimes_counties Open Source data from the New York Times on distribution of confirmed Covid-19 cases and deaths in the US by County. For more, see https://www.nytimes.com/article/coronavirus-county-data-us.html or the readme at https://github.com/nytimes/covid-19-data. https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv https://github.com/nytimes/covid-19-data/blob/master/LICENSE cases_total, deaths_total state country FALSE TRUE Passed 2020-05-04 16:08:39
covid19france covid19france refresh_covid19france Open Source data from opencovid19-fr on distribution of confirmed Covid-19 cases and deaths in the US States. For more, see https://github.com/opencovid19-fr/data. https://raw.githubusercontent.com/opencovid19-fr/data/master/dist/chiffres-cles.csv https://github.com/opencovid19-fr/data/blob/master/LICENSE confirmed, dead, icu, hospitalized, recovered, discovered county, region, country, overseas collectivity country FALSE TRUE Passed 2020-05-04 16:08:47
CanadaC19_cases CanadaC19 refresh_CanadaC19_cases Open Source data from multiple public reporting data throughout Canada. For more, see https://github.com/ishaberry/Covid19Canada. https://raw.githubusercontent.com/ishaberry/Covid19Canada/master/cases.csv https://github.com/debusklaneml/CanadaC19/blob/master/LICENSE cases_new state state FALSE TRUE Passed 2020-05-04 16:08:48
covid19us covid19us refresh_covid19us Open Source data from COVID Tracking Project on the distribution of Covid-19 cases and deaths in the US. For more, see https://github.com/opencovid19-fr/data. https://covidtracking.com/api https://github.com/aedobbyn/covid19us/blob/master/LICENSE.md positive, negative, pending, hospitalized_currently, hospitalized_cumulative, in_icu_currently, in_icu_cumulative, on_ventilator_currently, on_ventilator_cumulative, recovered, death, hospitalized, total, total_test_results, death_increase, hospitalized_increase, negative_increase, positive_increase, total_test_results_increase state country FALSE TRUE Passed 2020-05-04 16:08:50

Accessing data

Once you have figured out what dataset you want, you can access it with get_covid19_dataset()

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

nytimes_states <- get_covid19_dataset("covid19nytimes_states")
#> Parsed with column specification:
#> cols(
#>   date = col_date(format = ""),
#>   location = col_character(),
#>   location_type = col_character(),
#>   location_code = col_character(),
#>   location_code_type = col_character(),
#>   data_type = col_character(),
#>   value = col_double()
#> )

nytimes_states %>%
  filter(date == max(date)) %>%
  filter(data_type == "cases_total") %>%
  arrange(desc(value)) %>%
  head()
#> # A tibble: 6 x 7
#>   date       location location_type location_code location_code_t… data_type
#>   <date>     <chr>    <chr>         <chr>         <chr>            <chr>    
#> 1 2020-05-03 New York state         36            fips_code        cases_to…
#> 2 2020-05-03 New Jer… state         34            fips_code        cases_to…
#> 3 2020-05-03 Massach… state         25            fips_code        cases_to…
#> 4 2020-05-03 Illinois state         17            fips_code        cases_to…
#> 5 2020-05-03 Califor… state         06            fips_code        cases_to…
#> 6 2020-05-03 Pennsyl… state         42            fips_code        cases_to…
#> # … with 1 more variable: value <dbl>

The covid19R Data Standard

While many data sets have their own unique additional columns (e.g., Latitude, Longitude, population, etc.), all datasets have the following columns and are arranged in a long format:

Vocabularies

The location_type, location_code_type, and data_type from datasets and spatial_extent from the data info table all have their own controlled vocabularies. Others might be introduced as the collection of packages matures. To see the possible values of a standardized vocabulary, use get_covid19_controlled_vocab()

get_covid19_controlled_vocab("location_type") %>%
  knitr::kable()
location_type description
continent continental scale
country a country with soverign borders
state a spatial area inside that country such as a state, province, canton, etc.
county a spatial area demarcated within a state
city a single municipality - the smallest spatial grain of government in a country
canton the cantons of Switzerland and Principality of Liechtenstein (FL)