caltechlibrary / irdmtools

A Go and Python package for working with InvenioRDM repositories.
https://caltechlibrary.github.io/irdmtools
Other
1 stars 1 forks source link

Transfer of records with duplicate DOIs #15

Closed tmorrell closed 1 year ago

tmorrell commented 1 year ago

InvenioRDM doesn't support multiple records with the same DOI.

I've made a initial stab at the changes we need here https://github.com/caltechlibrary/irdmtools/pull/14. Additionally we'll need to

-Keep a dictionary with every DOI that has been transferred -If a new record has a DOI that has already been transferred, don't pass the doi to the create function (but it should stay within the metadata)

rsdoiel commented 1 year ago

To avoid maintaining something in a "dataset" collection or SQLite database I think we should management this querying Invenio RDM directly. The table structure of RDM is stable enough to write a SQL FUNCTION we can use to return a list of RDM record ids that reference the DOI. Then we can use the RDM API to retrieve the existing record(s) and decide how we want to handled the duplicate, e.g. push a new version or create a new record. This would avoid having to maintain a separate dataset (or SQLite table) that tracked this.

tmorrell commented 1 year ago

That works, and I've actually already got some code for that query. Just run https://github.com/caltechlibrary/irdm_harvester/blob/main/check_doi.py, and if it comes back true don't pass the doi to the create function

rsdoiel commented 1 year ago

This is taken care of in the irdm/fixup.py code as Tom suggested.