archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Generate AIP METS fails for bag SIPs if bag-info.txt has multiple instances of the same label #173

Closed helrond closed 5 years ago

helrond commented 5 years ago

Expected behaviour As an end user, I start an unzipped bag transfer. The bag is processed regularly through Archivematica services. Data from bag-info.txt is added to the AIP and DIP METS files.

Current behaviour Unzipped bag SIPs currently fail at the Generate AIP microservice if their bag-info.txt file contains more than one occurrence of a metadata element (which is allowed according to the BagIt spec) with the following error:

TypeError("Argument must be bytes or unicode, got 'list'",)
Traceback (most recent call last):
  File "/src/MCPClient/lib/clientScripts/create_mets_v2.py", line 1330, in call
    el = create_object_metadata(job, structMapDivObjects, baseDirectoryPath)
  File "/src/MCPClient/lib/clientScripts/create_mets_v2.py", line 1089, in create_object_metadata
    bag_tag.text = value
  File "src/lxml/lxml.etree.pyx", line 1031, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:55347)
  File "src/lxml/apihelpers.pxi", line 711, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:24667)
  File "src/lxml/apihelpers.pxi", line 699, in lxml.etree._createTextNode (src/lxml/lxml.etree.c:24516)
  File "src/lxml/apihelpers.pxi", line 1437, in lxml.etree._utf8 (src/lxml/lxml.etree.c:32414)
TypeError: Argument must be bytes or unicode, got 'list'

Steps to reproduce From the Transfer page in the Dashboard, select an Unzipped bag transfer and browse for a bag containing a bag-info.txt file with repeating fields, such as (repeating Records-Creators fields):

Bag-Software-Agent: bagit.py <http://github.com/libraryofcongress/bagit-python>
BagIt-Profile-Identifier: https://raw.githubusercontent.com/RockefellerArchiveCenter/project_electron/master/transfer/organizational-bag-profile.json
Bagging-Date: 2017-12-11T20:09:01.446465
Date-End: 2002-06-22
Date-Start: 2000-05-14
External-Identifier: records-2017-12-11T20:09:01.446465
Internal-Sender-Description: Grant awarded to the Village Green Preservation Society for the purpose of "preserving the old ways from being abused, protecting the new ways for me and for you"
Internal-Sender-Identifier: GrantsFord
Language: eng
Payload-Oxum: 247088.6
Record-Creators: Custard Pie Appreciation Consortium
Record-Creators: Desperate Dan Appreciation Society
Record-Type: grant records
Restrictions: Records open only to Mrs. Mopp and good old Mother Riley
Source-Organization: Ford Foundation
Title: Grant to the Village Green Preservation Society

Process transfer normally. SIP fails at Generate AIP microservice

I haven't tried this with Zipped bags, but I strongly suspect the same behavior will take place.

Your environment (version of Archivematica, OS version, etc) Mac OS High Sierra (10.13.6) Archivematica 1.7.2/Storage Service 0.12 (docker-compose)


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

peterVG commented 5 years ago

I created an unzipped Bag with duplicate metadata elements as per the example above:

screen shot 2018-09-28 at 16 25 54

I successfully transferred the Bag into AM 1.8 Bionic:

screen shot 2018-09-28 at 16 19 53

And the SIP completes the Prepare AIP microservice:

screen shot 2018-09-28 at 16 22 47

And the Bag metadata finds it way successfully into an amdSec in the AIP:

screen shot 2018-09-28 at 16 32 24