archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Failure in "Generate METS.xml" document: cannot convert 'NoneType' object to bytes #1581

Open alexwlchan opened 2 years ago

alexwlchan commented 2 years ago

We've got a bag which is failing at the "Generate METS.xml" step, we think because of something in the rights.csv.

Expected behaviour The bag is stored correctly.

Current behaviour All the steps prior to "Generate METS.xml" document complete successfully:

Screenshot 2022-10-21 at 12 58 13

That step fails with exit code 1 and the following stderr:

cannot convert 'NoneType' object to bytesTraceback (most recent call last):
  File "/src/src/MCPClient/lib/job.py", line 113, in JobContext
    yield
  File "/src/src/MCPClient/lib/clientScripts/create_transfer_mets.py", line 842, in call
    args.xml_file, args.base_path, args.base_path_string, args.sip_uuid
  File "/src/src/MCPClient/lib/clientScripts/create_transfer_mets.py", line 105, in write_mets
    fsentry_tree.scan()
  File "/src/src/MCPClient/lib/clientScripts/create_transfer_mets.py", line 194, in scan
    self.load_rights_data_from_db()
  File "/src/src/MCPClient/lib/clientScripts/create_transfer_mets.py", line 284, in load_rights_data_from_db
    premis_rights = rights_to_premis(rights, fsentry.file_uuid)
  File "/src/src/MCPClient/lib/clientScripts/create_transfer_mets.py", line 825, in rights_to_premis
    premis_data, premis_version=PREMIS_META["version"]
  File "/usr/local/lib/python3.6/dist-packages/metsrw/plugins/premisrw/premis.py", line 722, in data_to_premis
    return _data_to_lxml_el(data, "premis", nsmap)
  File "/usr/local/lib/python3.6/dist-packages/metsrw/plugins/premisrw/premis.py", line 609, in _data_to_lxml_el
    element, ns, nsmap, element_maker=element_maker, snake=snake
  File "/usr/local/lib/python3.6/dist-packages/metsrw/plugins/premisrw/premis.py", line 609, in _data_to_lxml_el
    element, ns, nsmap, element_maker=element_maker, snake=snake
  File "/usr/local/lib/python3.6/dist-packages/metsrw/plugins/premisrw/premis.py", line 609, in _data_to_lxml_el
    element, ns, nsmap, element_maker=element_maker, snake=snake
  File "/usr/local/lib/python3.6/dist-packages/metsrw/plugins/premisrw/premis.py", line 619, in _data_to_lxml_el
    args.append(six.binary_type(element))
TypeError: cannot convert 'NoneType' object to bytes

Steps to reproduce

  1. Upload this example package to Archivematica: rights_example.zip. It contains three files:2.

    $ cat example/greeting.txt
    Hello world
    
    $ cat metadata/metadata.csv
    filename,dc.identifier
    objects/,PP/EXAMPLE/1⏎
    
    $ cat metadata/rights.csv
    file,basis,status,determination_date,jurisdiction,start_date,end_date,terms,citation,note,grant_act,grant_restriction,grant_start_date,grant_end_date,grant_note,doc_id_type,doc_id_value,doc_id_role
    objects/greeting.txt,license,,,,,,,,CC-BY,use,,,,Open,,,⏎
  2. Send it for processing through Archivematica, using our default processing config.

Your environment (version of Archivematica, operating system, other relevant details) We're running the Docker images created from the v1.13.2 tag in the artefactual/archivematica repo, with just a couple of files replaced with custom versions.

As far as I can tell, this is where the error is introduced: https://github.com/artefactual/archivematica/blob/4f4605453d5a8796f6a739fa9664921bdb3418f2/src/MCPClient/lib/clientScripts/create_transfer_mets.py#L497

It's possible for license_section.licenseterms to be None, and then what that information is written into the PREMIS, the Premis plugin fails. If I add an if license_section.licenseterms is not None: above that line, the transfer package is processed successfully.


For Artefactual use:

Before you close this issue, you must check off the following:

replaceafill commented 5 months ago

This is still present in https://github.com/artefactual/archivematica/commit/5210e54c85362311a1fa0f99df5f046740bea6b5. The problem seems to be the create_transfer_mets client script doesn't consider some of the fields related to right statements are nullable. Nullable fields that represent dates are handled through a clean_date helper which solves this problem. Even though test coverage of the script is relatively high the fixtures used are very complete.