UAL-RE / ReBACH

Python-based tool to enable data preservation to a cloud-hosted storage solution
MIT License
2 stars 2 forks source link

Feat: Check if item version is already preserved before bagging (Issue #102) #103

Open HafeezOJ opened 2 months ago

HafeezOJ commented 2 months ago

Description

During preprocessing, this PR checks if a bag exists in AP Trust and Wasabi S3 bucket. It compares the hash of the current item version being prepared for bagging with the item's version hash in AP Trust if the item version has already been preserved. The article version will be skipped if a match is found else its bag will be updated. All activities are logged.

NOTE: This feature may sometimes put a name other than the first author's name in the eventual preservation package file due to the metadata sorting during metadata hash computation.

PROPOSED SOLUTION: Ignore authors' list during sorting while computing metadata hash. This is not included in this PR.

See #93

Documentation Update

Implementation Notes

This PR contains Utils.py in the figshare directory which houses utility functions. The following functions are available in this PR:

Bag checks are carried out in Article.py and Collection.py inside the figshare directory. Logging is done in app.py