epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Add source text and url fields to classes #388

Closed RichardMN closed 3 years ago

RichardMN commented 3 years ago

This is an approach to adding a source text and source url field to each of our regional data sources. (Once I've done all the regional data sources I may go back and do the other sources as well.) This would be a fix for #375

I'm leaving this as a draft for now because it's incomplete but would welcome advice (@joseph-palmer ?) on this approach even before I've done it for all the datasets.

My goal is that a user could get the source_text or source_url programmatically so that if they are using data prepared by someone it is straightforward to give credit (and possibly a link). This could even be done in markdown with something like paste0("[", data$source_text, "](", data_source_url ")") (and some wit to handle the case where there may not be an url. In general I think we should always be able to provide a source_text, usually be able to give an url. (Looking at the first few, it's clear that some of these datasets don't appear to have straightforward "entry points" but we can provide something people can look at (or click on) without having to pick through our code.

seabbs commented 3 years ago

Nice @RichardMN - this looks good to me and all seems sensible. For use downstream etc it might be cool to provide a citation/source method. (so DataClass$citation() and or DataClass$source()) which like the base R variant returns the nicely formatted version of this info ready to be used in a plot, etc. That might quite a bit more work (mostly thinking) so perhaps park in another issue if not keen to deal with here.

RichardMN commented 3 years ago

I've pulled in an interim fix to #389 here as well as completing my first pass of providing source texts and urls for the country regional data classes. I would welcome anyone familiar with these datasets to provide a sanity check (and corrections where necessary) on these.

@Bisaloo - I cannot quite figure out where the French data comes from. We have pointers to CSV files and I cannot line them up with what is available on the data.gouv.fr website. I've not looked that deeply, but you may already know what it would take me half an hour of spelunking to find out.

I plan to try to do the other sources (ECDC, JRC, JHU, WHO) next. The citation() and source() functions are a good idea but may take a bit more thinking on how to apply them.

Question: should the field names be changed to something like credit_text and credit_url instead of source_*?

seabbs commented 3 years ago

Totally agree that the helper functions are for another PR. For the field name I prefer source vs credit but no strong opinion.

RichardMN commented 3 years ago

I've now added source fields for the remaining data classes.

Question: Should we document that the ECDC data source terminated in late December?

RichardMN commented 3 years ago

For easier review of the list, below is a table of the current contents of all_country_data.

origin class level_1_region level_2_region level_3_region type data_urls source_data_cols source_text source_url
Belgium Belgium region province NA regional https://epistat.sciensano.be/Data/COVID19BE_MORT.csv, https://epistat.sciensano.be/Data/COVID19BE_CASES_AGESEX.csv, https://epistat.sciensano.be/Data/COVID19BE_HOSP.csv cases_new, deaths_new Sciensano (Belgian institute of health) https://epistat.wiv-isp.be/covid/
Brazil Brazil state city NA regional https://github.com/wcota/covid19br/raw/master/cases-brazil-cities-time.csv.gz cases_total, deaths_total Wesley Cota https://github.com/wcota/covid19br/blob/master/README.en.md
Canada Canada province NA NA regional https://health-infobase.canada.ca/src/data/covidLive/covid19.csv cases_new, cases_total, deaths_new, recovered_total, tested_new Public Health Infobase, Public Health Agency of Canada https://open.canada.ca/data/en/dataset/261c32ab-4cfd-4f81-9dea-7b64065690dc
Colombia Colombia departamento NA NA regional https://raw.githubusercontent.com/danielcs88/colombia_covid-19/master/datos/cronologia.csv cases_total Daniel Cárdenas https://github.com/danielcs88/colombia_covid-19/
Covid-19 Data Hub Covid19DataHub country region subregion national https://storage.covid19datahub.io/rawdata-1.csv confirmed, deaths, recovered, tested, hosp COVID-19 Data Hub https://covid19datahub.io
Cuba Cuba provincia NA NA regional https://covid19cubadata.github.io/data/covid19-casos.csv cases_new COVID19 Cuba Data team https://covid19cubadata.github.io/#cuba
European Centre for Disease Control (ECDC) ECDC country NA NA national https://opendata.ecdc.europa.eu/covid19/casedistribution/csv cases_new, deaths_new European Centre for Disease Prevention and Control (ECDC) https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
France France region department NA regional https://www.data.gouv.fr/fr/datasets/r/001aca18-df6a-45c8-89e6-f82d689e6c01 cases_new, tested_new French Public Open Data Platform https://www.data.gouv.fr/fr/pages/donnees-coronavirus
Germany Germany bundesland landkreis NA regional https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv cases_new, deaths_new Robert Koch-Institut (RKI) https://hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/explore
Google Google country subregion subregion2 national https://storage.googleapis.com/covid19-open-data/v2/epidemiology.csv, https://storage.googleapis.com/covid19-open-data/v2/hospitalizations.csv, https://storage.googleapis.com/covid19-open-data/v2/index.csv new_confirmed, new_deceased, new_recovered, new_tested, total_confirmed, total_deceased, total_recovered, total_tested O. Wahltinez and others https://github.com/GoogleCloudPlatform/covid-19-open-data
India India state NA NA regional https://api.covid19india.org/csv/latest/state_wise_daily.csv cases_new, deaths_new, recovered_new COVID19India https://www.covid19india.org
Italy Italy regioni NA NA regional https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv cases_total, deaths_total, tested_total Department of Civil Protection, Italy https://github.com/pcm-dpc/COVID-19/blob/master/README_EN.md
John Hopkins University (JHU) JHU country region NA national https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv, https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv, https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv confirmed, deaths, recovered Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) https://github.com/CSSEGISandData/COVID-19/
European Commission's Joint Research Centre (JRC) JRC country region NA national https://raw.githubusercontent.com/ec-jrc/COVID-19/master/data-by-country/jrc-covid-19-all-days-by-country.csv CumulativePositive, CuulativeDeceased, CumulativeRecovered, CurrentlyPositive, Hospitalized, IntensiveCare European Commission Joint Research Centre (JRC) https://github.com/ec-jrc/COVID-19
Lithuania Lithuania county municipality NA regional https://opendata.arcgis.com/datasets/d49a63c934be4f65a93b6273785a8449_0.csv cases_new, tested_new, recovered_total, deaths_new Lithuanian Statistics Department https://hub.arcgis.com/datasets/d49a63c934be4f65a93b6273785a8449_0/about
Mexico Mexico estado municipio NA regional Downloads/filesDD.php?csvaxd, https://datos.covid-19.conacyt.mx/ cases_new, deaths_new Government of Mexico https://datos.covid-19.conacyt.mx
Netherlands Netherlands province municipality NA regional https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_per_dag.csv cases_new, deaths_new, hosp_new National Institute for Public Health and the Environment (RIVM), Netherlands https://data.rivm.nl/covid-19/
South Africa SouthAfrica province NA NA regional https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv, https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv cases_new, deaths_new, recovered_new Data Science for Social Impact research group, University of Pretoria https://github.com/dsfsi/covid19za
Switzerland Switzerland canton NA NA regional https://github.com/openZH/covid_19/raw/master/COVID19_Fallzahlen_CH_total_v2.csv hosp_new, deaths_total, recovered_total, cases_total, tested_total Open Data, Canton of Zurich https://github.com/openZH/covid_19/
United Kingdom (UK) UK region authority NA regional https://www.england.nhs.uk/statistics, https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2021/04/COVID-19-daily-admissions-and-beds-20210406-1.xlsx, https://api.coronavirus.data.gov.uk/v2/data newCasesBySpecimenDate, cumCasesBySpecimenDate, newCasesByPublishDate, cumCasesByPublishDate, newDeaths28DaysByPublishDate, cumDeaths28DaysByPublishDate, newDeaths28DaysByDeathDate, cumDeaths28DaysByDeathDate, newTestsByPublishDate, cumTestsByPublishDate, newAdmissions, cumAdmissions, newPillarOneTestsByPublishDate, newPillarTwoTestsByPublishDate, newPillarThreeTestsByPublishDate, newPillarFourTestsByPublishDate Public Health England https://coronavirus.data.gov.uk/
United States of America (USA) USA state county NA regional https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv cases_total, deaths_total New York Times https://github.com/nytimes/covid-19-data
World Health Organisation (WHO) WHO country NA NA national https://covid19.who.int/WHO-COVID-19-global-data.csv cases_new, cases_total, deaths_new, deaths_total World Health Organisation https://covid19.who.int
RichardMN commented 3 years ago

It seems that the workflow actions are no longer triggered or run on this PR? Or, at least, I cannot trigger them myself.

It's passed CMD-check on my system (macOS, R 4.1.0) at home but I can't test the other build environments. Or I could try running it on my branch but I don't know the results would show here.

@seabbs or @joseph-palmer you might be able to get them to run?

seabbs commented 3 years ago

Merging and will fix quickly on master.