Closed christophscheuch closed 1 week ago
The downloaded Excel files have extra columns prefixed as column 1, column 2, ... which are not necessary. In order to read each file please use:
col_names <- readxl::read_excel(path = "file_name.xlsx", n_max = 0) column_range <- data.frame(names = names(col_names)) %>% mutate(type = ifelse(test = (grepl(pattern = "[0:9]",x = names)), yes = "numeric", no = "text")) %>% dplyr::filter(!grepl(pattern = "column",x = names,ignore.case = TRUE))
ids_data = readxl::read_excel( path = "file_name.xlsx", range = readxl::cell_cols(1:nrow(column_range)), col_types = column_range$type )
The downloaded Excel files have extra columns prefixed as column 1, column 2, ... which are not necessary. In order to read each file please use:
choosing columns
col_names <- readxl::read_excel(path = "file_name.xlsx", n_max = 0) column_range <- data.frame(names = names(col_names)) %>% mutate(type = ifelse(test = (grepl(pattern = "[0:9]",x = names)), yes = "numeric", no = "text")) %>% dplyr::filter(!grepl(pattern = "column",x = names,ignore.case = TRUE))
reading files
ids_data = readxl::read_excel( path = "file_name.xlsx", range = readxl::cell_cols(1:nrow(column_range)), col_types = column_range$type )
Amazing, thank you this worked! I'll finalize the PR later today.
I checked the licenses of readxl and its dependencies, and fortunately none of them use GPL3.
I see a few issues here, including a few missing imports and timeouts for large downloads. I will just go ahead and push a fix.
Sorry this is taking so long. I built tests and am trying to get them to pass.
@christophscheuch, I added interactive warnings for file sizes >100 MB. Also added a customizable timeout and function arguments to customize these behaviors. Refactored ids_bulk
and added a unit test suite. Take a look and see what you think about these design decisions from a UX perspective.
Almost all my test cases are passing, but when I try to read_excel
my chosen test file in the final, end-to-end test, I get the following error:
"Error in utils::unzip(zip_path, list = TRUE)
: error -103 with zipfile in unzGetCurrentFileInfo"
This happens with your last version as well as my updated version, so it may be something unexpected about the file rather than something wrong with the code. I have been testing with "https://datacatalogfiles.worldbank.org/ddh-published/0038015/DR0092201/A_D.xlsx?versionId=2024-10-08T01:35:39.3946879Z". What file were you using in your local testing to verify that the code works?
Ah, I think I solved it! You have to use "wb" mode in download.file
with an Excel file, or it will fail for Windows users.
Still one test not passing, but pushed because I'm calling it a night!
Still one test not passing, but pushed because I'm calling it a night!
Great work! I feel inspired to add more tests like these in my own packages in the future :)
I only had to add test_path
to debug the last test and I added validation for column names and types.
I only had to add
test_path
to debug the last test and I added validation for column names and types.
Yes! Thanks for catching that! These are the sorts of things you overlook when you're exhausted because you keep telling yourself you'll go to bed as soon as the tests pass, and they keep not passing. :')
Great work! I feel inspired to add more tests like these in my own packages in the future :)
I know a lot of people who consider test-writing a waste of time, but for me the feedback loop between tests and code is essential to my process. Not only does it save me having to manually check all the edge cases I want the code to handle, but it also documents what those edge cases are and how the code is supposed to handle them, which makes it a lot easier to write docs later!
I started out with a draft for the functions first to initialize the review because I'm not 100% where this should go.
Unfortunately, I cannot read in the xlsx files because my Mac has only 16GB RAM (Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()). Am I missing something about reading in the xlsx files? @Reu-hub @t-emery