Closed edsu closed 2 years ago
It sounds like one of the datastreams in druid:bb059dp5973
is missing a created date.
Having a created date is required for writing to OCFL I imagine?
It definitely needs a created date. I haven't personally seen a datastream without a created date, so I'm a little surprised that you have some. Is created date not a required field in Fedora 3? I think it's an open question how situations like this should be handled. If the date is missing, would you want the current date to be used? That seems undesirable to me. Perhaps it could be taken off the object?
It's strange all the <datastreamVersion>
elements in that FOXML have a CREATED
. I sprinkled in some additional debugging and I can see that this conditional is returning false, and so the following look up fails since there's nothing in the HashMap?
<foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="RELS-EXT.1" LABEL="RDF Statements about this object" CREATED="2012-05-07T22:32:05.346Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="458">
<foxml:xmlContent>
<rdf:RDF xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="info:fedora/druid:bb059dp5973">
<hydra:isGovernedBy rdf:resource="info:fedora/druid:rd845kr7465"/>
<fedora-model:hasModel rdf:resource="info:fedora/afmodel:Dor_Item"/>
</rdf:Description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
<foxml:datastreamVersion ID="RELS-EXT.0" LABEL="RDF Statements about this object" CREATED="2012-02-14T19:26:13.359Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="427">
<foxml:xmlContent>
<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="info:fedora/druid:bb059dp5973">
<hydra:isGovernedBy rdf:resource="info:fedora/druid:rd845kr7465"/>
</rdf:Description>
</rdf:RDF>
</foxml:xmlContent>
</foxml:datastreamVersion>
Is the ordering of IDs significant here? RELS-EXT.1
and then RELS-EXT.0
?
Yep, that's the bug. It's not sorting the datastream versions after it reads them from the FOXML and is relying on them appearing in the correct order. Thanks for digging into that. I can get that patched next week.
That would be great! Since our repository system allowed hand-editing of datastreams our data might exercise lots of corner cases.
I was just experimenting with the foxml export since I had it handy, but if we were to use migration-utils it would likely be with the fcrepo3 filesystem. Is that code path affected by this ordering too?
@edsu This PR should fix this problem: https://github.com/fcrepo-exts/migration-utils/pull/182
Are you able to try it?
Yes, it worked! I see the object that failed above was able to be written. Feel free to close this or leave open if you want to use it as a reminder to release the fix.
I've been testing the conversion of some exported FOXML with this command:
The majority of objects are processed fine, but it has been running into this error occasionally (20/1000 objects):
The resulting object does not appear to be written to the OCFL tree. Does anyone have a hunch what might be the problem here?