Open LukasWallrich opened 3 months ago
I've just seen that this behaviour was framed as a bug in #137 - so we went back and forth on this.
If we keep it as is, we need to write in each example:
# Option A - if you used manual = TRUE during deduplication:
count_sources(dedup$unique)
# Option B - if you used manual = FALSE during deduplication:
count_sources(dedup)
Alternatively, I'd be happy to remove the manual option from dedup_citations, so that this always returns a dataframe - and offer a separate dedup_citations_with_manual that always returns a list? @TNRiley @kaitlynhair what do you think?
While the split function seems cleaner in my mind, I think the split argument is better in line with ASySD.
It would be great to chat about this at our next meeting if we can get everyone on the call. I don't know if I can think of all the possible impacts or pros/cons outside of what @LukasWallrich mentioned, and I think there would be more. I'll email everyone and see if we can all make it.
Wait for https://github.com/camaradesuk/ASySD/issues/49 - ideally dedup_citations() should behave the same in both packages.
I believe we decided at some point that - in line with ASySD - dedup_citations should always return a list, with deduped citations in unique and (optionally) potential pairs for manual deduplication in another item. However, currently, it sometimes returns a dataframe (when manual = FALSE) and sometimes a list (when manual = TRUE) - which makes it less intuitive to use, and harder to write examples.
Does anyone object to changing it in that way? It means that we need to call dedup_result$unique to access the deduplicated citations, but can obviously assign that to unique_citations in any vignettes if we want to.