camaradesuk / ASySD

https://camaradesuk.github.io/ASySD/
GNU General Public License v3.0
12 stars 5 forks source link

Deal with error if `duplicate_id` field is in input data #29

Closed LukasWallrich closed 8 months ago

LukasWallrich commented 1 year ago

Currently, ASySD fails with a rather cryptic error if the input data contains a duplicate_id field - such a field should probably be renamed at the start with a note to the user?

tibble(
  title = c(rep(LETTERS[1:3], 2), LETTERS[7:9]), author = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
  journal = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
  year = c(2010:2012, 2010:2011, 2025, 2017:2019), source = c(rep("aa", 5), rep("bb", 4)),
  record_id = letters[1:9], label = "", duplicate_id = 1:9
) %>% ASySD::dedup_citations()
#> Warning: The following columns are missing: doi, pages, volume, number, abstract, isbn
#> Are you sure you want to proceed? 
#> 
#> 1: Yes
#> 2: No
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> Called from: generate_dup_id(true_pairs, raw_citations, keep_source, keep_label)
#> Joining with `by = join_by(record_id)`
#> Error in `mutate()`:
#> ℹ In argument: `ComponentID = ifelse(...)`.
#> Caused by error in `duplicate_id$ComponentID`:
#> ! $ operator is invalid for atomic vectors
#> Backtrace:
#>      ▆
#>   1. ├─... %>% ASySD::dedup_citations()
#>   2. ├─ASySD::dedup_citations(.)
#>   3. │ └─ASySD:::generate_dup_id(...)
#>   4. │   └─duplicate_id %>% right_join(raw_citations) %>% ... at <tmp>:16:4
#>   5. ├─dplyr::mutate(...)
#>   6. ├─dplyr:::mutate.data.frame(...)
#>   7. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   8. │   ├─base::withCallingHandlers(...)
#>   9. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>  10. │     └─mask$eval_all_mutate(quo)
#>  11. │       └─dplyr (local) eval()
#>  12. ├─base::ifelse(...)
#>  13. ├─base::paste0(max(duplicate_id$ComponentID) + row_number())
#>  14. └─base::.handleSimpleError(...)
#>  15.   └─dplyr (local) h(simpleError(msg, call))
#>  16.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
LukasWallrich commented 1 year ago

A similar issue occurs when record_ids are in the input data - with a slightly more interpretable error message

kaitlynhair commented 8 months ago

Now resolved in latest version