artefactual-labs / mets-reader-writer

Library to parse and create METS files, especially for Archivematica.
https://mets-reader-writer.readthedocs.io
GNU Affero General Public License v3.0
20 stars 13 forks source link

Fix empty PREMIS:OBJECT errors #62

Closed cole closed 5 years ago

cole commented 5 years ago

I found this issue hard to debug because there are a couple things interacting:

  1. duplicate ids are generated due to randint usage
  2. duplicate ids results in multiple in memory references to the same xml element on the next metsrw parse of the document
  3. lxml recognizes the duplicate references and interprets it as a moved element on next write, resulting in empty containers

I've moved to sequential id generation and put in a copy when parsing contained xml — that should be a little safer.

Connects to https://github.com/archivematica/Issues/issues/442.