archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: DC included as metadata.JSON in Baggit payload does NOT get written to AM-METS #1513

Open currmie opened 2 years ago

currmie commented 2 years ago

Users have reported that any DublinCore specified in a metadata.json file and then included as part of the Baggit payload is not getting parsed and written to the Archivematica METS.

Is it possible that a reference back to the load_dublin_core.py script needs to be included in src/MCPClient/lib/clientScripts/copy_transfer_submission_documentation.py?

Expected behaviour The Baggit standard requires that you wrap the payload sub-directories and files within a "data" directory. See the following sample directory structure:

bagTest/
├── bag-info.txt
├── bagit.txt
├── data
│   ├── audio
│   │   └── bird.mp3
│   ├── beihai.tif
│   └── metadata
│       └── metadata.json
├── manifest-md5.txt
└── tagmanifest-md5.txt

There is also an assumption that within the JSON file, we specify the filename field first and then always start the filename path with objects/.

For example:

[
  {
    "filename": "objects/audio/bird.mp3",
    "dc.title": "14000 Caen, France - Bird in my garden",
    "dc.creator": "Nicolas Germain",
    "dc.description": "Bird singing in my garden, Caen, France, Zoom H6",
    "dc.subject": [
       "field recording",
       "soundscapes",
       "radio aporee"
       ]

When a user initiates a 'zipped directory' transfer and includes descriptive metadata, Archivematica should theoretically unpack the contents of the package, convert the metadata.json file to CSV format, and then ultimately write the DC to the METS upon AIP generation.

Current behaviour AM cannot interpret data/objects/behai.tif after the bag contents have been unpacked and copied over to the SIP.

Steps to reproduce

  1. Created a zipped directory transfer that is in accordance with the Baggit standard and is similar to the above example.
  2. Run the transfer through pipeline to generate and store AIP.
  3. Review the AM-METS for DC metadata.

Your environment (version of Archivematica, operating system, other relevant details) Users have written in about this issue after having reproduced it in versions 1.9, 1.12.1, and 1.13.


For Artefactual use:

Before you close this issue, you must check off the following:

Bustel commented 5 days ago

We still experienced the same behaviour in Version 1.16. Are there any plans to work on this? We would very much prefer to work with JSON files because the tooling around it is much better and converting JSON to CSV is very painful and potentially error prone.

And if there are plans to fix this: Should filenames be start with data/ or not? In my opinion processing should be consistent whether or not Bags are unpacked or not, so the data/ prefix makes no sense to me.