Closed mitchellmanware closed 8 months ago
It's because we wanted to leave them in the same folder as the data they generated, plus the GitHub tests didn't like these download functions because of an issue with needing read-write privileges to download files. It was a discussion we had with @Spatiotemporal-Exposures-and-Toxicology and @sigmafelix so they may be able to explain more in detail.
This is actually a good thing to bring up because it may inform how we organize (& test) the covariate download scripts and data folders.
That all makes sense, thank you. Your point about testing the download scripts is what brought the question up.
@MAKassien @mitchellmanware Thank you for bringing up the issue. I think we could include download-and-preprocess functions into ./R, but my concern is it should not pass the test, as we briefly discussed the write error for testing download functions. I think that we could add some test scripts for such functions, which expect errors. Then the function is partly (at least) tested. Ideally, all download functions would rest in ./R/download.R
(for example).
@MAKassien @mitchellmanware @sigmafelix
When the download functions are in ./R/ then the test-coverage and R-CMD checks initiate their run and ending up redownloading the data. I agree with @sigmafelix that we should them all in one R function in R/ folder. However, we have to do something to bypass their running the download commands. Some potential options include (1) skip specific lines of line code in tests and build. (2) Add something to the R code that checks if the files exist, if so, then skip the download lines.
Is anyone else having trouble accessing the VPN or is it just me?
Update- My wifi decided to stop for while 😅
Follow up- November 22, 2023 @Spatiotemporal-Exposures-and-Toxicology @sigmafelix @MAKassien
As mentioned in the previous comments, a potential way to add the download functions into the .R/ folder is by adding a "test mode" argument to the function. Using the argument to bypass the download command could cover most of the code in the function. For example, only two lines of code would be uncovered in the following scenario (download_geos_cf_data.R).
# above is rest of the download function #
#### 11. test mode
if (test_mode == FALSE) {
#### 12 download data
system(command = system_command)
#### 13 remove "..._wget_commands.txt" file
file.remove(commands_txt)
}
We can write tests to check that the ..._wget_commands.txt
file exists, as well as the contents of the file (ie. number of rows matches expected based on defined date sequence) without actually executing the download commands.
Would the test coverage and R-CMD check read-write privileges also prohibit the creation of the ..._wget_commands.txt
file?
Thanks @mitchellmanware I've added this to the new version of download_tri functuon that I will push to branch #185 today. I think the original problem was with web downloads specifically, so creating a local file like that .txt may not be a problem? Although Kyle and Insang may have a better idea of how this works. @Spatiotemporal-Exposures-and-Toxicology @sigmafelix
@mitchellmanware @MAKassien
We might allow writing files in GitHub Action runner by changing yaml settings. Before implementing changes in GitHub Action side, how about generating a character vector internally and sink it if an argument is set TRUE
? In corresponding test scripts, we could only get the character vector with URLs and do some "presence check" with a function in httr
package (StackOverflow post). For example:
testthat::test_that("Check if files on Web addresses are present", {
withr::local_package("httr")
# provided that download_foo includes an argument to return character object
urllist <- download_foo(..., return_urls = TRUE)
urlFileExist <- function(url){
HTTP_STATUS_OK <- 200
hd <- httr::HEAD(url)
status <- hd$all_headers[[1]]$status
return(status == HTTP_STATUS_OK)
}
# randomly pick urls and check
url_pick <- sample(urllist, 30L, replace = FALSE)
url_status <- sapply(url_pick, urlFileExist)
testthat::expect_true(is.character(urllist))
testthat::expect_true(all(url_status))
})
Of course, we should modify DESCRIPTION
file to allow httr
in GitHub Action runners. This is another discussion topic for next Monday.
@MAKassien @sigmafelix
The test_mode
argument in this example does what Insang suggested. It sinks a character vector (.txt file) with all of the commands but does not execute the download command.
> ################################################################################
> testthat::test_that("Check if files on Web addresses are present", {
+ withr::local_package("httr")
+ # provided that download_foo includes an argument to return character object
+
+ # download arguments
+ directory_to_save <- "/Volumes/manwareme/function_copies/tests/data/"
+ collection <- "chm_tavg_1hr_g1440x721_v1"
+ date_start <- "2018-01-01"
+ date_end <- "2018-12-31"
+
+ # download
+ download_geos_cf_data(date_start = date_start,
+ date_end = date_end,
+ collection = collection,
+ directory_to_save = directory_to_save,
+ data_download_acknowledgement = TRUE,
+ test_mode = TRUE)
+
+ # read commands
+ wget_path <- paste0(directory_to_save,
+ collection,
+ "_wget_commands.txt")
+ wget_commands <- read.csv(wget_path, header = FALSE)
+
+ # convert to character
+ wget_commands <- wget_commands[1:nrow(wget_commands),]
+
+ # extract URLs from command
+ urllist <- NULL
+ for (w in seq_along(wget_commands)) {
+ string <- wget_commands[w]
+ url <- str_split_i(string, " ", 2)
+ urllist <- c(urllist, url)
+ }
+
+ urlFileExist <- function(url){
+ HTTP_STATUS_OK <- 200
+ hd <- httr::HEAD(url)
+ status <- hd$all_headers[[1]]$status
+ return(status == HTTP_STATUS_OK)
+ }
+
+ # randomly pick urls and check
+ url_pick <- sample(urllist, 30L, replace = FALSE)
+ url_status <- sapply(url_pick, urlFileExist)
+
+ testthat::expect_true(is.character(urllist))
+ testthat::expect_true(all(url_status))
+
+ })
Test passed 🎉
Similar to other tests, the urlFileExist
function maybe should live in the R/ folder as it will be applied to all of the download function tests.
@sigmafelix add rvest and httr to description file in Imports
Current plan:
Potential update:
@mitchellmanware I think we can close this. Please confirm and close if so.
@Spatiotemporal-Exposures-and-Toxicology Confirming complete.
Why do download/process functions live in the 'input' folder and not 'R'.