Closed LukasWallrich closed 1 month ago
I can't replicate this issue - can you test this again and let me know if it still has warnings? I believe min/max should work for strings / letters as well as numeric values.
Odd ... if I run this code, I still get the warning (after installing the current version). However, you are right that it is not about min/max - and not about non-numeric IDs ... instead, it always arises with this test data:
> tibble(
+ title = c(rep(LETTERS[1:3], 2), LETTERS[7:9]), author = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
+ journal = c(rep(LETTERS[1:3], 2), LETTERS[7:9]),
+ year = c(rep(2010:2012, 2), 2017:2019), source = c(rep("aa", 5), rep("bb", 4)),
+ record_id = 1:9, label = ""
+ ) %>% ASySD::dedup_citations()
Warning: The following columns are missing: doi, pages, volume, number, abstract, isbn
Are you sure you want to proceed?
1: Yes
2: No
Selection: 1
formatting data...
identifying potential duplicates...
identified duplicates!
Joining with `by = join_by(record_id)`
flagging potential pairs for manual dedup...
Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
9 citations loaded...
3 duplicate citations removed...
6 unique citations remaining!
$unique
# A tibble: 6 × 14
duplicate_id record_ids title author journal year source label doi pages volume number abstract isbn
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1, 4 A A A 2010 aa, aa unknown, unknown "" "" "" "" "" ""
2 2 2, 5 B B B 2011 aa, aa unknown, unknown "" "" "" "" "" ""
3 3 3, 6 C C C 2012 aa, bb unknown, unknown "" "" "" "" "" ""
4 7 7 G G G 2017 bb unknown "" "" "" "" "" ""
5 8 8 H H H 2018 bb unknown "" "" "" "" "" ""
6 9 9 I I I 2019 bb unknown "" "" "" "" "" ""
$manual_dedup
# A tibble: 0 × 41
# ℹ 41 variables: author1 <chr>, author2 <chr>, author <dbl>, title1 <chr>, title2 <chr>, title <dbl>, abstract1 <lgl>, abstract2 <lgl>,
# abstract <dbl>, year1 <int>, year2 <int>, year <dbl>, number1 <lgl>, number2 <lgl>, number <dbl>, pages1 <lgl>, pages2 <lgl>, pages <dbl>,
# volume1 <lgl>, volume2 <lgl>, volume <dbl>, journal1 <chr>, journal2 <chr>, journal <dbl>, isbn <dbl>, isbn1 <lgl>, isbn2 <lgl>, doi1 <lgl>,
# doi2 <lgl>, doi <dbl>, record_id1 <chr>, record_id2 <chr>, label1 <chr>, label2 <chr>, source1 <chr>, source2 <chr>, duplicate_id.x <chr>,
# duplicate_id.y <chr>, match <lgl>, min_id <chr>, max_id <chr>
Warning message:
There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `min_id = min(duplicate_id.x, duplicate_id.y)`.
Caused by warning in `min()`:
! no non-missing arguments, returning NA
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning.
I cannot provide a proper reprex because of Error in menu(c("Yes", "No"), title = paste("Are you sure you want to proceed?")): menu() cannot be used non-interactively
- could that be changed to allow for automated use? E.g., just by saying if (interactive()) menu() else warning()?
This happens because maybe_pairs is empty - so it is the same as
tibble(a = character(0)) %>% mutate(min_a = min(a))
So you should probably check whether there are any maybe_pairs and exit the function prematurely if there aren't?
I think this has now been resolved - it now returns without errors for me.
To allow automated use, the user_input argument set to 1 allows you to bypass the menu - perhaps not the most obvious feature so might need to rename this.
Just checked and this is now solved - thanks
I appreciate the support for non-numeric record IDs - yet currently, that leads to warnings and possibly subsequent errors here. Could that be
sort(c(duplicate_id.x, duplicate_id.y))[1]
and then belowsort(c(duplicate_id.x, duplicate_id.y))[2]
? Then it would work for numbers and strings?