caltechlibrary / irdmtools

A Go and Python package for working with InvenioRDM repositories.
https://caltechlibrary.github.io/irdmtools
Other
1 stars 1 forks source link

Do file mapping #3

Closed tmorrell closed 1 year ago

tmorrell commented 1 year ago

We need to be able to map files to versions automatically, based on the file security and file content metadata. An example that needs to be split is https://authors.library.caltech.edu/43491/

rsdoiel commented 1 year ago

For EPrint items with multiple documents, versions and permissions need to be split into versions before import into RDM. It will require multiple RDM API calls to build the imported version of the record. eprint2rdm needs to return a RDM record list where each individual RDM record is an RDM version.

RDM records should all have the same "persmissions" (document level metadata in eprint) and same version (files level metadata in eprints). Put non-public versions first and public versions in later RDM list order.

The metadata we want to retain for each file is

Explude (filter out) the "volitile" files (e.g. the EPrints generated thumbnails, index.txt files) Filter out the volitile files

rsdoiel commented 1 year ago

This is done in the eprints_to_rdm.py in conjunction with irdm/fixup.py and irdm/irdmtools.py.