asreview / synergy-dataset

SYNERGY - Open machine learning dataset on study selection in systematic reviews
Creative Commons Zero v1.0 Universal
62 stars 27 forks source link

Find OpenAlexID with incorrect PMID and correct DOI #101

Open EmilyWes opened 20 hours ago

EmilyWes commented 20 hours ago

Currently, when both the PMID and the DOI are available in a dataset, the enrich.py script will first try to find the OpenAlexID based on PMID. It will overwrite the current PMID and DOI of the records with a PMID, even if no OpenAlexID (and thus no DOI) was found. This results in an empty DOI, which will no longer be searched on.

Given that in all following steps we only use the OpenAlexID, I suggest we simply do not overwrite the PMID and DOI we retrieve from OpenAlex. This solves the problem and has the benefit that we maintain the ID that was present in the original data and was searched on.