USAID-OHA-SI / fastR

Import COP19/20 FAST ER Tool
Other
0 stars 1 forks source link

Issuing running map_dfr #3

Closed achafetz closed 5 years ago

achafetz commented 5 years ago

I'm having an issue with the below code, running run_fastR() on all the files. Some non-standardization is messing with this. Need to investigate later today. (has worked so far on a few one off files)

files <- list.files("C:/Users/achafetz/Downloads/FASTs 2-21-19", full.names = TRUE)
fast <- purrr::map_dfr(.x = files,
                       .f = ~ run_fastR(.x, "3 Initiative-E"))
jaliasd commented 5 years ago

FWIW, I have a problem with map_dfr throwing an error when I try to run it through multiple site-level MSDs.

> map( .x = makey_rds, .f = ~site.msd(.x)) Error: cannot allocate vector of size 22.6 Mb Called from: ifelse(. == "", NA, .) Error during wrapup: cannot allocate vector of size 32.0 Mb

I just cut and paste the completed files to a temp folder and re-run on the remaining .txts. I usually have to do this a few times with each run through the site level MSDs

jaliasd commented 5 years ago

This is the error I got when I tried to run run_fastR on just one of the files for "3 Initiative-E" tab as well

`> fast <- purrr::map_dfr(.x = files, .f = ~ run_fastR(.x, "3 Initiative-E"))

Error: Columns 69, 70, 71, 72, 73, ... cannot have NA as name`

jaliasd commented 5 years ago

And I'm still getting the same error on the "2 Intervention-E" tab. It's strange because I reinstalled the package and verified that identify_ou was referencing the correct cell (it was).

`> identify_ou function (df, filepath) { ou <- readxl::read_excel(filepath, sheet = "1 PLL", range = "G4") %>% names() df <- df %>% dplyr::mutate(operatingunit = ou) %>% dplyr::select(operatingunit, dplyr::everything()) } <bytecode: 0x00000000134801a8>

> fast <- purrr::map_dfr(.x = files, .f = ~ run_fastR(.x, "3 Initiative-E")) Error: Columns `69`, `70`, `71`, `72`, `73`, ... cannot have NA as name > #read in all FAST files and combine into one data frame > df_all <- map_dfr(.x = files, .f = ~ run_fastR(.x, "2 Intervention-E")) Error: Column `operatingunit` must be length 130 (the number of rows) or one, not 0`
achafetz commented 5 years ago

ugh, so there are 5 non-SGAC checked files in the bunch of 25. I had to convert the importing of the name to a new function and then check to see if the vector is a length of 0 (ie there is no OU name in the specified cell).

  #pull OU name from PLL sheet & store
  #ref cell changed with SGAC review to G4
  #if not, check in E4
    ou <- extract_ou(filepath, "G4")

    if(length(ou) == 0)
    ou <- extract_ou(filepath, "E4")

where extract_ou() is:

extract_ou <- function(filepath, cell){

  ou <- readxl::read_excel(filepath,
                     sheet = "1 PLL",
                     range = cell) %>%
    names()

  return(ou)

}

Updating the packaged in a bit and will close out when this is resolved.