Closed tmorrell closed 1 year ago
For EPrint items with multiple documents, versions and permissions need to be split into versions before import into RDM. It will require multiple RDM API calls to build the imported version of the record. eprint2rdm needs to return a RDM record list where each individual RDM record is an RDM version.
RDM records should all have the same "persmissions" (document level metadata in eprint) and same version (files level metadata in eprints). Put non-public versions first and public versions in later RDM list order.
The metadata we want to retain for each file is
Explude (filter out) the "volitile" files (e.g. the EPrints generated thumbnails, index.txt files) Filter out the volitile files
This is done in the eprints_to_rdm.py in conjunction with irdm/fixup.py and irdm/irdmtools.py.
We need to be able to map files to versions automatically, based on the file security and file content metadata. An example that needs to be split is https://authors.library.caltech.edu/43491/