Closed seabbs closed 3 years ago
I agree all download steps should be ran in the $download
step. The issue around using data.frame
seems fixed by #190
I will open a PR to move all downloading steps into the correct area.
so the only issue here is if we ever need to download data we can't easily combine into a data.frame with lots of cleaning. The UK is perhaps an example of that as there would be 3 data.frames that are maybe not trivial to link?
If we can't we need raw
to be defined as a list and changes tests to handle that (which is not totally trivial but not that hard I think)/
hmmm yeah the UK is complex. I will push the PR im working on and then we can go from there in terms of linking things up.
sounds good.
At the moment the UK implementation downloads additional data in the cleaning method. This breaks the package standard and makes testing data downloading discretely from data cleaning difficult. It also does not support level 1 region codes when downloading level 2 data (which should have been the previous package standard but is open for discussion).
See download call in cleaning method here: https://github.com/epiforecasts/covidregionaldata/blob/0cae42df192bd96d59088587c02cc724e7029772/R/UK.R#L128
The UK implementation also changes the definition of
data$raw
from a data.frame used elsewhere to a list. This again causes some testing issues. I think the $raw should be a standard format, ideally adata.frame
, but it could be generalised if the appropriate updates were made. If we want to generalise need to relax testing to removedata.frame
tests onraw
data (lets discuss this).As part of this issue need to expand testing to cover downloading and cleaning the NHS regions dataset.
Looking at this some more I also see issues with region codes being downloaded during cleaning.
https://github.com/epiforecasts/covidregionaldata/blob/68c69ac07a467689ece242b9329ff28133701e95/R/UK.R#L145