Closed biocyberman closed 2 years ago
I’ll try to make some detailed comments later today or tomorrow but there are two things which I think will need to be looked at based on my very superficial first glance:
jsonlite
as a dépends in DESCRIPTION
? I suspect we’re already implicitly depending on it through something else. I’d looked at this when considering an alternate approach for German data #410 but then the German data problem resolved itselfdownload
method instead of clean_common
(See that closed PR)Hi @RichardMN Thanks for checking this out. And the detailed review so far. I will be happy to answer all of the during tomorrow. So just a quick answer below for now.
I’ll try to make some detailed comments later today or tomorrow but there are two things which I think will need to be looked at based on my very superficial first glance:
* Do we need to add `jsonlite` as a dépends in `DESCRIPTION`? I suspect we’re already implicitly depending on it through something else. I’d looked at this when considering an alternate approach for German data [Switch to REST approach for downloading Germany date - possible fix for #408 #410](https://github.com/epiforecasts/covidregionaldata/pull/410) but then the German data problem resolved itself
The data source emits json data, so I used jsonlite
. It is therefore a good idea to add it to DESCRIPTION
.
* I think the download work should be done in an overclassed `download` method instead of `clean_common` (See that closed PR)
I will take a look at that. That would also explain why I got weird behavior of
covidregionaldata::get_regional_data
where it downloaded data fromcommon_data_urls
.
One more fiddly - add "Nam" into the WORDLIST file. This will avoid the spellchecker complaining about it being an odd word.
My force push
may cause extra work for reviewing. Sorry for that.
FWIW, I created a simplified version of the front end here for focusing on Vietnam data: https://biocyberman.github.io/covid
Thank you for all the work to integrate Vietnam into
covidregionaldata
It's my pleasure and out of necessity to be honest :)
When I look at the map of where we have sub-national data and coverage and where we don't there are lots of white spaces and it's important to fill them in, which can encourage people to use the data for all sorts of purposes.
Do you mean that there are subnational regions that do not have data? I am trying to push to have complete coverages But there are multiple administration layers we have to get through and get that done.
As I noted in another note, this is similar in some ways to work I did in #410 and I think that the package as a whole should be able to use JSON inputs. I think a lot of this code can probably just be moved into a
download
method and then it may play more nicely with the rest of the class.Like I answered in other replies, I refactored the download part to
download
method.The other thing to look at is running the tests and R-CMD check on the package, and looking at the outputs of the test builds done by github (on MacOS, linux and Windows). I only have access to one OS at home but sometimes the errors from other platforms have shown me glitches which my home system was protecting me from. I haven't written any test for this, and can you elaborate more on R-CMD command to use? Nonetheless, I have built the package and installed in on my local PC to use for other packages (e.g.
covid-rt-estimate
,covid
(front-end))
When I look at the map of where we have sub-national data and coverage and where we don't there are lots of white spaces and it's important to fill them in, which can encourage people to use the data for all sorts of purposes.
Do you mean that there are subnational regions that do not have data? I am trying to push to have complete coverages But there are multiple administration layers we have to get through and get that done.
My remark was about our world map which we put into our logo - we have coverage for several European and American countries, but very few in Asia or Africa. Adding Vietnam is a step towards addressing this.
I haven't written any test for this, and can you elaborate more on R-CMD command to use? Nonetheless, I have built the package and installed in on my local PC to use for other packages (e.g.
covid-rt-estimate
,covid
(front-end))
Try running devtools::check()
on the package. This will run a number of tests (which GitHub runs on the PR automatically) and identify problems which probably need to be resolved before the PR can be merged. (For various reasons, including that the package will get booted from cran if the issues are present in the code, and it took some effort to get it back onto cran.)
@biocyberman just a note to say thanks for this and great job porting our website and backend code over for Vietnam. Will look at integrating that other work tomorrow (hopefully).
So, a bit of exploratory work.
There is now a branch here called json-reader which has a very straightforward implementation of a drop-in download_JSON
.
There is also a branch called json-reader-vietnam which uses that implementation and the bulk of @biocyberman's Vietnam code to start using that approach to pull in data.
This gets me to the point where I have four lists of lists (of lists?):
The first three are list of 63 lists (one for each province), where the items of the list are the date (as a name) and a value (as an integer):
library(covidregionaldata)
vn <- covidregionaldata::Vietnam$new()
vn$download()
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=case_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=death_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=recovered_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces
vn$data$raw$main[[1]][[1]][515:519]
#> $`13/9/2021`
#> [1] 4310
#>
#> $`14/9/2021`
#> [1] 4331
#>
#> $`15/9/2021`
#> [1] 4345
#>
#> $`16/9/2021`
#> [1] 4360
#>
#> $`17/9/2021`
#> [1] 4360
Created on 2021-09-16 by the reprex package (v2.0.1)
This is where someone else's R skills will outstrip mine to figure out how to convert each of these lists into a table of date and count (with appropriate column label), then apply the number index of that list as the province code (possibly prefixing it with VN-
), and then bind the rows, and then full join the three resulting tables. I'm sure I could stare at the dplyr
crib sheet for a while but I don't think I can do that tonight.
@biocyberman just a note to say thanks for this and great job porting our website and backend code over for Vietnam. Will look at integrating that other work tomorrow (hopefully).
Great to hear. Thank you for yours and the team's work :+1:
So, a bit of exploratory work.
There is now a branch here called json-reader which has a very straightforward implementation of a drop-in
download_JSON
.There is also a branch called json-reader-vietnam which uses that implementation and the bulk of @biocyberman's Vietnam code to start using that approach to pull in data.
Great. I will take a look at this and incorporate into the code.
This gets me to the point where I have four lists of lists (of lists?):
* main (cases by date) * death_by_time * recovered_by_time * provinces
The first three are list of 63 lists (one for each province), where the items of the list are the date (as a name) and a value (as an integer):
library(covidregionaldata) vn <- covidregionaldata::Vietnam$new() vn$download() #> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=case_by_time #> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=death_by_time #> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=recovered_by_time #> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces vn$data$raw$main[[1]][[1]][515:519] #> $`13/9/2021` #> [1] 4310 #> #> $`14/9/2021` #> [1] 4331 #> #> $`15/9/2021` #> [1] 4345 #> #> $`16/9/2021` #> [1] 4360 #> #> $`17/9/2021` #> [1] 4360
Created on 2021-09-16 by the reprex package (v2.0.1)
This is where someone else's R skills will outstrip mine to figure out how to convert each of these lists into a table of date and count (with appropriate column label), then apply the number index of that list as the province code (possibly prefixing it with
VN-
), and then bind the rows, and then full join the three resulting tables. I'm sure I could stare at thedplyr
crib sheet for a while but I don't think I can do that tonight.
Yes it was challening to get the final data.frame with dplyr
. So for the time being, I used the rather whole school way todo it from #L74 - #L96 here: https://github.com/biocyberman/covidregionaldata/blob/vietnam/R/Vietnam.R#L74
Province code is stored in covidregionaldata::vietnam_codes
.
Here's a version using purrr
. It could still use lubridate
to be applied to the date field.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(purrr)
vn <- covidregionaldata::Vietnam$new()
vn$download()
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=case_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=death_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=recovered_by_time
#> Downloading data from https://covid.ncsc.gov.vn/api/v3/covid/provinces
vn_data_inputs <- vn$data$raw[1:3]
vn_flat_all <- map(map(vn_data_inputs, function(x) as_tibble(unlist(x), rownames="date")),
function(x) {x %>% separate(date, sep="[.]+", into=c(NA, "province", "date"))})
full_join(
full_join(
vn_flat_all$case_by_time, vn_flat_all$death_by_time, by=c("province","date"),suffix=c(".cases", ".deaths"),copy=TRUE),
vn_flat_all$recovered_by_time, by=c("province","date"), suffix=c("",".recovered"), copy=TRUE) %>%
select(province, date, cases = value.cases, deaths=value.deaths, recovered = value)
#> # A tibble: 32,464 × 5
#> province date cases deaths recovered
#> <chr> <chr> <int> <int> <int>
#> 1 1 18/3/2020 0 0 0
#> 2 1 19/3/2020 0 0 0
#> 3 1 20/3/2020 0 0 0
#> 4 1 21/3/2020 0 0 0
#> 5 1 22/3/2020 0 0 0
#> 6 1 23/3/2020 0 0 0
#> 7 1 24/3/2020 0 0 0
#> 8 1 25/3/2020 0 0 0
#> 9 1 26/3/2020 0 0 0
#> 10 1 27/3/2020 0 0 0
#> # … with 32,454 more rows
Created on 2021-09-17 by the reprex package (v2.0.1)
And the json-reader-vietnam
branch now appears to have a fully "tidy" approach to cleaning. (Tidy, but not entirely pretty in some places.) Messy as it is, I still think it's easier to follow. There are likely some glitches of treatment of region codes.
Today I think I/we/@biocyberman can try to write a generic JSON downloader which works for Vietnam and acts like the CSV downloader we have. Making it more flexible can be a next step, with the condition that it doesn’t break the Vietnam code.
@RichardMN I looked at json_reader
and json_read_vietnam
branches. There would be a couple of errors in the new Vietnam.R
and I can fix quickly.. However, the json_reader
branch need to be merged to master first before I can update this PR to incorporate the json_reader feature. Otherwise, we can merge this PR and use json_reader
later.
@RichardMN I looked at
json_reader
andjson_read_vietnam
branches. There would be a couple of errors in the newVietnam.R
and I can fix quickly.. However, thejson_reader
branch need to be merged to master first before I can update this PR to incorporate the json_reader feature. Otherwise, we can merge this PR and usejson_reader
later.
Later today I will bring across the functional parts of the json-reader branch into this PR, and then it can go into master from this PR. It makes sense to introduce it where we may use it. I will put the changes in and @biocyberman can review and check the sanity on the data. I think it better to have the cleaner approach in the master branch from the start, but also to be using the new download_json
method when it's added.
If you'd prefer I can make a separate PR for the json-reader branch but I think the code's been pretty much reviewed and it can go from here.
@RichardMN I looked at
json_reader
andjson_read_vietnam
branches. There would be a couple of errors in the newVietnam.R
and I can fix quickly.. However, thejson_reader
branch need to be merged to master first before I can update this PR to incorporate the json_reader feature. Otherwise, we can merge this PR and usejson_reader
later.Later today I will bring across the functional parts of the json-reader branch into this PR, and then it can go into master from this PR. It makes sense to introduce it where we may use it. I will put the changes in and @biocyberman can review and check the sanity on the data. I think it better to have the cleaner approach in the master branch from the start, but also to be using the new
download_json
method when it's added.If you'd prefer I can make a separate PR for the json-reader branch but I think the code's been pretty much reviewed and it can go from here.
In that case I think we can do both json_reader and Vietnam.R in this PR, both to give context and to avoid re-review the json_reader PR.
So I've been through all the changes. I also ran covid-rt-estimates
package on top of this new version of covidregionaldata
. It works. So a green light from me :)
@RichardMN and @seabbs I think this PR got stuck because of the failure with codecov/patch. I am not familiar its working and therefore would not make it unstuck quickly. Can you give a push there? Thanks
Merged in the changes from master, bumped version, updated news, added @biocyberman as a contributor, and ran tests.
Great to see this get merged. Thank you for your support @RichardMN, @seabbs @Bisaloo
As the title says, I would like to add the subnational data for Vietnam. The covid situation their needs closer look into subnational level. This tool would be very helpful. Thanks