Currently, ASySD fails with a rather cryptic error if the input data contains a duplicate_id field - such a field should probably be renamed at the start with a note to the user?
tibble(
title = c(rep(LETTERS[1:3], 2), LETTERS[7:9]), author = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
journal = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
year = c(2010:2012, 2010:2011, 2025, 2017:2019), source = c(rep("aa", 5), rep("bb", 4)),
record_id = letters[1:9], label = "", duplicate_id = 1:9
) %>% ASySD::dedup_citations()
#> Warning: The following columns are missing: doi, pages, volume, number, abstract, isbn
#> Are you sure you want to proceed?
#>
#> 1: Yes
#> 2: No
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> Called from: generate_dup_id(true_pairs, raw_citations, keep_source, keep_label)
#> Joining with `by = join_by(record_id)`
#> Error in `mutate()`:
#> ℹ In argument: `ComponentID = ifelse(...)`.
#> Caused by error in `duplicate_id$ComponentID`:
#> ! $ operator is invalid for atomic vectors
#> Backtrace:
#> ▆
#> 1. ├─... %>% ASySD::dedup_citations()
#> 2. ├─ASySD::dedup_citations(.)
#> 3. │ └─ASySD:::generate_dup_id(...)
#> 4. │ └─duplicate_id %>% right_join(raw_citations) %>% ... at <tmp>:16:4
#> 5. ├─dplyr::mutate(...)
#> 6. ├─dplyr:::mutate.data.frame(...)
#> 7. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#> 8. │ ├─base::withCallingHandlers(...)
#> 9. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#> 10. │ └─mask$eval_all_mutate(quo)
#> 11. │ └─dplyr (local) eval()
#> 12. ├─base::ifelse(...)
#> 13. ├─base::paste0(max(duplicate_id$ComponentID) + row_number())
#> 14. └─base::.handleSimpleError(...)
#> 15. └─dplyr (local) h(simpleError(msg, call))
#> 16. └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
Currently, ASySD fails with a rather cryptic error if the input data contains a
duplicate_id
field - such a field should probably be renamed at the start with a note to the user?