artefactual-labs / mets-reader-writer

Library to parse and create METS files, especially for Archivematica.
https://mets-reader-writer.readthedocs.io
GNU Affero General Public License v3.0
20 stars 13 forks source link

Problem: `FSEntry` instances can circularly reference themselves in their `derived_from` attributes #37

Open jrwdunham opened 6 years ago

jrwdunham commented 6 years ago

It is possible for f = FSEntry(); f.derived_from = f to be true when certain METS files are parsed, cf. the strange derived_from values and lack of UUIDs in the following:

P1050152.JPG with UUID None is derived from P1050152.JPG with UUID None
P1050154.JPG with UUID None is derived from P1050152.JPG with UUID None
P1050155.JPG with UUID None is derived from P1050152.JPG with UUID None
P1050156.JPG with UUID None is derived from P1050152.JPG with UUID None

Parsing the METS file of the AIP at http://am17x.qa.archivematica.org/archival-storage/6214faf5-eab6-424c-b0f9-b1078e7c0828/ will exhibit this behaviour. This seems to be related to the presence of USE="service" type files.

<mets:div LABEL="service" TYPE="Directory" DMDID="dmdSec_2">
  <mets:div LABEL="P1050152.JPG" TYPE="Item">
    <mets:fptr FILEID="file-acabdea5-3f09-4dd4-814c-cab7cbc662dc"/>
  </mets:div>
  <mets:div LABEL="P1050154.JPG" TYPE="Item">
    <mets:fptr FILEID="file-bb7fb59a-a858-482d-b8f8-d9631356d5cc"/>
  </mets:div>
  <mets:div LABEL="P1050155.JPG" TYPE="Item">
    <mets:fptr FILEID="file-414c5cc1-f5df-4a4a-8621-c4303b82092f"/>
  </mets:div>
  <mets:div LABEL="P1050156.JPG" TYPE="Item">
    <mets:fptr FILEID="file-d5c9fcbe-284d-44f9-a6b8-f6518185518e"/>
  </mets:div>
</mets:div>

This will ultimately trigger a RuntimeError: maximum recursion depth exceeded error when attempting an AIP re-ingest. See https://github.com/artefactual/archivematica-storage-service/issues/254.

More investigation needed.