NYPL / ami-preservation

Repo for NYPL's Audio and Moving Image Preservation Unit. Documentation Site:
https://nypl.github.io/ami-preservation/
16 stars 3 forks source link

More metadata about payload #27

Open nkrabben opened 3 years ago

nkrabben commented 3 years ago

copied from issue on ami-specs https://github.com/NYPL/ami-specifications/issues/19

Define a folder and file name scheme for any metadata about digitized objects that should not go into the payload. Examples include QC Tools reports, extracted timecode, ffmpeg logs.

Proposal: Top level directory named metadata. Files in the directory should have the same name as the file they are related to, up until the extension, e.g. abc_123456_v01_pm.mkv would have a qctool report named abc_123456_v01_pm.qctools.tar.gz. Checksums for any files in the metadata are optional and should be written to the tagmanifest (as part of bagit spec)

Example:

581608
├── bag-info.txt
├── bagit.txt
├── data
│   ├── PreservationMasters
│   │   ├── myt_581608_v01_pm.json
│   │   └── myt_581608_v01_pm.mkv
│   └── ServiceCopies
│       ├── myt_581608_v01_sc.json
│       └── myt_581608_v01_sc.mp4
├── manifest-md5.txt
├── metadata
│   ├── myt_581608_v01_pm_rp188any_frame_timecodes.txt
│   └── myt_581608_v01_pm.qctools.tar.gz
└── tagmanifest-md5.txt
nkrabben commented 3 years ago

kieranjol commented 2 hours ago Curious to know why not in the bag? Will there be checksums elsewhere for those files? I’d consider a similar approach.

nkrabben commented 2 hours ago A big part is because of issues with out ingest processes which will not be fixed in the near future. This lets us skirt those issues.

BagIt spec says that any file not in the payload is a tag file and it can, but doesn't need to, have a checksum listed in the tagmanifest. The nice thing about this is that adding or removing tag files doesn't change the oxum in bag-info.txt or lines in manifest-....txt, so you can potentially bag earlier and then perform additive processes without constantly needing to update those files and then their checksums in the tagmanifest. The not nice thing, is that I'm not sure if any of the bagging tools I use have a mode to update the tagmanifest. Might have to write one.

bturkus commented 1 hour ago we shall christen this the "half in the bag" approach

kieranjol commented 3 years ago

I love the idea of bagging early and I like this use of the tag manifest.