epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Refactor UK to meet R6 standard #192

Closed seabbs closed 3 years ago

seabbs commented 3 years ago

At the moment the UK implementation downloads additional data in the cleaning method. This breaks the package standard and makes testing data downloading discretely from data cleaning difficult. It also does not support level 1 region codes when downloading level 2 data (which should have been the previous package standard but is open for discussion).

See download call in cleaning method here: https://github.com/epiforecasts/covidregionaldata/blob/0cae42df192bd96d59088587c02cc724e7029772/R/UK.R#L128

The UK implementation also changes the definition of data$raw from a data.frame used elsewhere to a list. This again causes some testing issues. I think the $raw should be a standard format, ideally a data.frame, but it could be generalised if the appropriate updates were made. If we want to generalise need to relax testing to remove data.frame tests on raw data (lets discuss this).

As part of this issue need to expand testing to cover downloading and cleaning the NHS regions dataset.

Looking at this some more I also see issues with region codes being downloaded during cleaning.

https://github.com/epiforecasts/covidregionaldata/blob/68c69ac07a467689ece242b9329ff28133701e95/R/UK.R#L145

joseph-palmer commented 3 years ago

I agree all download steps should be ran in the $download step. The issue around using data.frame seems fixed by #190

I will open a PR to move all downloading steps into the correct area.

seabbs commented 3 years ago

so the only issue here is if we ever need to download data we can't easily combine into a data.frame with lots of cleaning. The UK is perhaps an example of that as there would be 3 data.frames that are maybe not trivial to link?

If we can't we need raw to be defined as a list and changes tests to handle that (which is not totally trivial but not that hard I think)/

joseph-palmer commented 3 years ago

hmmm yeah the UK is complex. I will push the PR im working on and then we can go from there in terms of linking things up.

seabbs commented 3 years ago

sounds good.