Benchmarking performance of ASySD (+ list of overlooked duplicates to potentially review)

camaradesuk / ASySD

https://camaradesuk.github.io/ASySD/

GNU General Public License v3.0

11 stars 5 forks source link

Benchmarking performance of ASySD (+ list of overlooked duplicates to potentially review) #3

Open LukasWallrich opened 2 years ago

LukasWallrich commented 2 years ago

Before including it in a registered report, I wanted to evaluate the performance of ASySD - using the method recently published here

McKeown, S., & Mir, Z. M. (2021). Considerations for conducting systematic reviews: evaluating the performance of different methods for de-duplicating references. Systematic reviews, 10(1), 1-8.

Your algorithm does really well on what I see as the most important outcome - there are no false positives, i.e. no entries that get lost. However, on false negatives, it could do better with 86% sensitivity - even though that is already better than any of the common reference managers ... so I am definitely excited about using this going forward.

Details on the benchmark are here - when you are documenting this package, you might want to include something like that? Also, there is a csv of false negatives there (i.e. overlooked duplicates) that might be worth reviewing?

kaitlynhair commented 2 years ago

Thanks for this!

I have performed an evaluation of the ASySD shiny app by testing performance across 5 biomedical datasets versus other tools. (Publication coming soon (I hope), preprint here: https://www.biorxiv.org/content/10.1101/2021.05.04.442412v1). I will need to add this into the documentation!

As I mentioned in the other issue - this package has been edited a lot recently to introduce new functionality. Although the method to identify duplicate citations has not changed, I think there was a bug in the new function to merge duplicates and retain additional meta-data (as opposed to only keeping one full citation from each duplicate group). I'm not sure how easy it is to run this again but I expect it may have changed(?). In our evaluation we calculated an overall sensitivity and specificity ~95% but this could be variable across different research domains.

LukasWallrich commented 2 years ago

Sure, it'd be easy to rerun that ... should work without any manual input. Let me know when you think the version is stable, then I'd be happy to give it another go. The dataset used in that article also seems to be biomedical, so I am not sure why the sensitivity is a fair bit lower than in your test.

LukasWallrich commented 1 year ago

Just FYI: I refreshed the benchmark (as I am now registering that registered report) ... and with the current version, specificity is still at 100% (no false positives, except for a mistake in the test dataset), and sensitivity is up to 91%. Details here