geomarker-io / riseup_geomarker_pipeline

A Multi-Modal Geomarker Pipeline for Assessing the Impact of Social, Economic, and Environmental Factors on Pediatric Hospitalization
1 stars 0 forks source link

geocoder fail ? #14

Closed erikarasnick closed 1 year ago

erikarasnick commented 1 year ago

When I geocode the parsed_addresses, the first time I run through the geocoder, I get the error below. In the past, we have told people to just run it again, which works if you are able to utilize the geocoding_cache. So, I can get this to work if I don't use degauss_run but just use degauss at the command line, let it fail the first time, then run it again.

I'm guessing this is related to RAM? I have about 8 GB allocated to docker right now.

── Welcome to DeGAUSS! ──

• You are using geocoder, version 3.3.0
• This container returns geocodes
• <https://degauss.org/geocoder>

ℹ removing non-alphanumeric characters...
ℹ removing excess whitespace...
ℹ flagging PO boxes...
ℹ flagging known Cincinnati foster & institutional addresses...
ℹ flagging non-address text and missing addresses...
ℹ now geocoding ...
  |================================| 100%, Elapsed 01:01:51
16 errors occurred
the first one occurred in element 1225:

Error in readRDS(file = file.path(path, key)): error reading from connection

Error in `dplyr::mutate()`:
! Problem while computing `geocodes = purrr::map(geocodes, ~.x %>%
  purrr::map(unlist) %>% as_tibble())`.
Caused by error in `purrr::map()`:
ℹ In index: 1225.
Caused by error:
! Column 1 must be named.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names can't be empty.
✖ Empty name found at location 1.
Backtrace:
     ▆
  1. ├─... %>% dplyr::arrange(desc(precision), score)
  2. ├─dplyr::arrange(., desc(precision), score)
  3. ├─dplyr::mutate(...)
  4. ├─dplyr::select(., -fips_county, -prenum, -number, -row_index)
  5. ├─dplyr::rename(...)
  6. ├─dplyr::ungroup(.)
  7. ├─dplyr::slice(., 1)
  8. ├─dplyr::group_by(., row_index)
  9. ├─tidyr::unnest(., cols = c(geocodes))
 10. ├─dplyr::mutate(...)
 11. ├─dplyr:::mutate.data.frame(...)
 12. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
 13. │   ├─base::withCallingHandlers(...)
 14. │   └─mask$eval_all_mutate(quo)
 15. ├─purrr::map(geocodes, ~.x %>% purrr::map(unlist) %>% as_tibble())
 16. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
 17. │   ├─purrr:::with_indexed_errors(...)
 18. │   │ └─base::withCallingHandlers(...)
 19. │   └─.f(.x[[i]], ...)
 20. │     └─.x %>% purrr::map(unlist) %>% as_tibble()
 21. ├─tibble::as_tibble(.)
 22. └─tibble:::as_tibble.list(.)
 23.   └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 24.     └─tibble:::set_repaired_names(x, repair_hint = TRUE, .name_repair)
 25.       ├─rlang::set_names(...)
 26.       └─tibble:::repaired_names(names2(x), repair_hint, .name_repair = .name_repair, quiet = quiet)
 27.         ├─tibble:::subclass_name_repair_errors(...)
 28.         │ └─base::withCallingHandlers(...)
 29.         └─vctrs::vec_as_names(...)
 30.           └─vctrs (local) `<fn>`()
 31.             └─vctrs:::validate_unique(names = names, arg = arg, call = call)
 32.               └─vctrs:::stop_names_cannot_be_empty(names, call = call)
 33.                 └─vctrs:::stop_names(...)
 34.                   └─vctrs:::stop_vctrs(...)
 35.                     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
Execution halted
erikarasnick commented 1 year ago

The roads container also stopped part of the way through (proabably RAM again) and aadt finished and wrote output but then gave an error that vector memory was exhausted. I can try again with more RAM allocated to Docker...

also, I am trying this on my old Macbook, because the roads container "got stuck" due to arm64 emulation.

cole-brokamp commented 1 year ago

cole to verify

cole-brokamp commented 1 year ago

I've always been working with a subset of the data, so I haven't seen this problem yet. We will keep an eye out for when we scale up to the entire dataset.