Closed tmorrell closed 1 year ago
To avoid maintaining something in a "dataset" collection or SQLite database I think we should management this querying Invenio RDM directly. The table structure of RDM is stable enough to write a SQL FUNCTION we can use to return a list of RDM record ids that reference the DOI. Then we can use the RDM API to retrieve the existing record(s) and decide how we want to handled the duplicate, e.g. push a new version or create a new record. This would avoid having to maintain a separate dataset (or SQLite table) that tracked this.
That works, and I've actually already got some code for that query. Just run https://github.com/caltechlibrary/irdm_harvester/blob/main/check_doi.py, and if it comes back true don't pass the doi to the create function
This is taken care of in the irdm/fixup.py code as Tom suggested.
InvenioRDM doesn't support multiple records with the same DOI.
I've made a initial stab at the changes we need here https://github.com/caltechlibrary/irdmtools/pull/14. Additionally we'll need to
-Keep a dictionary with every DOI that has been transferred -If a new record has a DOI that has already been transferred, don't pass the doi to the create function (but it should stay within the metadata)