ESHackathon / CiteSource

http://www.eshackathon.org/CiteSource/
GNU General Public License v3.0
17 stars 1 forks source link

Should dedup_citations always return list? #199

Open LukasWallrich opened 3 months ago

LukasWallrich commented 3 months ago

I believe we decided at some point that - in line with ASySD - dedup_citations should always return a list, with deduped citations in unique and (optionally) potential pairs for manual deduplication in another item. However, currently, it sometimes returns a dataframe (when manual = FALSE) and sometimes a list (when manual = TRUE) - which makes it less intuitive to use, and harder to write examples.

Does anyone object to changing it in that way? It means that we need to call dedup_result$unique to access the deduplicated citations, but can obviously assign that to unique_citations in any vignettes if we want to.

LukasWallrich commented 3 months ago

I've just seen that this behaviour was framed as a bug in #137 - so we went back and forth on this.

If we keep it as is, we need to write in each example:

# Option A - if you used manual = TRUE during deduplication:

count_sources(dedup$unique)

# Option B - if you used manual = FALSE during deduplication:

count_sources(dedup)

Alternatively, I'd be happy to remove the manual option from dedup_citations, so that this always returns a dataframe - and offer a separate dedup_citations_with_manual that always returns a list? @TNRiley @kaitlynhair what do you think?

TNRiley commented 3 months ago

While the split function seems cleaner in my mind, I think the split argument is better in line with ASySD.

It would be great to chat about this at our next meeting if we can get everyone on the call. I don't know if I can think of all the possible impacts or pros/cons outside of what @LukasWallrich mentioned, and I think there would be more. I'll email everyone and see if we can all make it.

LukasWallrich commented 1 month ago

Wait for https://github.com/camaradesuk/ASySD/issues/49 - ideally dedup_citations() should behave the same in both packages.