Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 618 forks source link

Crossref title searches are non-deterministic #530

Open mskarlin opened 1 month ago

mskarlin commented 1 month ago

When using the get_doc_details_from_crossref -- using a title search, even when limited to rows=1 (as is hard coded) will not deterministically return the same DOI for titles with multiple DOIs. For example running with A Perspective on Explanations of Molecular Prediction will return both "10.1021/acs.jctc.2c01235" and "10.26434/chemrxiv-2022-qfv02" randomly.

We need to probably return 2/3 and, if there are exact title matches, then we need to sort by publication date.