Closed AviMaayan closed 9 months ago
Seems a tad tricky given that there are completely separate PMC ids and tables. Perhaps if there is some metadata somewhere about it, I'll look into it.
Update:
I dug a bit and was able to find a reference in crossref pointing from the preprint's DOI to the published article DOI https://api.crossref.org/works/10.1101/2021.02.20.431155
{"relation":{"is-preprint-of":[{"id-type":"doi","id":"10.1016\/j.cell.2021.07.023","asserted-by":"subject"}]}}
The way to address this I think is to capture these relationships in the database and just omit results of preprints containing published articles also in the database. We at least have DOIs for the PMCs but we need to find a good endpoint for bulk download of DOI metadata (perhaps it's just the crossref API I referenced above).
Update:
The following API query seems likely to get us what we want specifically: https://api.crossref.org/works?select=DOI%2Crelation&filter=doi%3A10.1101%2F2021.02.20.431155%2Cdoi%3A10.1016%2Fj.cell.2021.07.023
i.e. we can grab potential relation.is-preprint-of
records for DOIs in batches.
Not sure this is a major issue... or an easy thing to fix. It came from the first user that reported any issue.