fcrepo-exts / migration-utils

An in-development framework for managing data migrations from previous versions to 4.x.
Apache License 2.0
13 stars 29 forks source link

java.lang.NullPointerException: text #180

Closed edsu closed 2 years ago

edsu commented 2 years ago

I've been testing the conversion of some exported FOXML with this command:

java -jar migration-utils-6.1.0-driver.jar 
  --source-type exported \
  --exported-dir archive \
  --target-dir ocfl
  --debug

The majority of objects are processed fine, but it has been running into this error occasionally (20/1000 objects):

DEBUG 14:03:08.394 (UserProvidedPidListManager) PID: druid:bb059dp5973, accept? true
DEBUG 14:03:08.394 (ResumePidListManager) PID: druid:bb059dp5973, accept? true
INFO 14:03:08.394 (Migrator) Processing "druid:bb059dp5973"...
DEBUG 14:03:08.408 (ArchiveGroupHandler) Committing object <info:fedora/druid:bb059dp5973>
ERROR 14:03:08.424 (Migrator) MIGRATION_FAILURE: pid="druid:bb059dp5973", message="text"
java.lang.NullPointerException: text
        at java.base/java.util.Objects.requireNonNull(Objects.java:233)
        at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1955)
        at java.base/java.time.Instant.parse(Instant.java:399)
        at org.fcrepo.migration.handlers.ocfl.ArchiveGroupHandler.createDatastreamHeaders(ArchiveGroupHandler.java:618)
        at org.fcrepo.migration.handlers.ocfl.ArchiveGroupHandler.processObjectVersions(ArchiveGroupHandler.java:256)
        at org.fcrepo.migration.handlers.ObjectAbstractionStreamingFedoraObjectHandler.completeObject(ObjectAbstractionStreamingFedoraObjectHandler.java:84)
        at org.fcrepo.migration.foxml.FoxmlInputStreamFedoraObjectProcessor.complete(FoxmlInputStreamFedoraObjectProcessor.java:223)
        at org.fcrepo.migration.foxml.FoxmlInputStreamFedoraObjectProcessor.lambda$processObject$0(FoxmlInputStreamFedoraObjectProcessor.java:218)
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:79)
        at org.fcrepo.migration.foxml.FoxmlInputStreamFedoraObjectProcessor.processObject(FoxmlInputStreamFedoraObjectProcessor.java:218)
        at org.fcrepo.migration.Migrator.run(Migrator.java:161)
        at org.fcrepo.migration.PicocliMigrator.call(PicocliMigrator.java:328)
        at org.fcrepo.migration.PicocliMigrator.call(PicocliMigrator.java:51)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1743)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2101)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2068)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1935)
        at picocli.CommandLine.execute(CommandLine.java:1864)
        at org.fcrepo.migration.PicocliMigrator.main(PicocliMigrator.java:175)

The resulting object does not appear to be written to the OCFL tree. Does anyone have a hunch what might be the problem here?

pwinckles commented 2 years ago

It sounds like one of the datastreams in druid:bb059dp5973 is missing a created date.

edsu commented 2 years ago

Having a created date is required for writing to OCFL I imagine?

pwinckles commented 2 years ago

It definitely needs a created date. I haven't personally seen a datastream without a created date, so I'm a little surprised that you have some. Is created date not a required field in Fedora 3? I think it's an open question how situations like this should be handled. If the date is missing, would you want the current date to be used? That seems undesirable to me. Perhaps it could be taken off the object?

edsu commented 2 years ago

It's strange all the <datastreamVersion> elements in that FOXML have a CREATED. I sprinkled in some additional debugging and I can see that this conditional is returning false, and so the following look up fails since there's nothing in the HashMap?

  <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
    <foxml:datastreamVersion ID="RELS-EXT.1" LABEL="RDF Statements about this object" CREATED="2012-05-07T22:32:05.346Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="458">
      <foxml:xmlContent>
        <rdf:RDF xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
          <rdf:Description rdf:about="info:fedora/druid:bb059dp5973">
            <hydra:isGovernedBy rdf:resource="info:fedora/druid:rd845kr7465"/>
            <fedora-model:hasModel rdf:resource="info:fedora/afmodel:Dor_Item"/>
          </rdf:Description>
        </rdf:RDF>
      </foxml:xmlContent>
    </foxml:datastreamVersion>
    <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="RDF Statements about this object" CREATED="2012-02-14T19:26:13.359Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="427">
      <foxml:xmlContent>
        <rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:hydra="http://projecthydra.org/ns/relations#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
          <rdf:Description rdf:about="info:fedora/druid:bb059dp5973">
            <hydra:isGovernedBy rdf:resource="info:fedora/druid:rd845kr7465"/>
          </rdf:Description>
        </rdf:RDF>
      </foxml:xmlContent>
    </foxml:datastreamVersion>

Is the ordering of IDs significant here? RELS-EXT.1 and then RELS-EXT.0?

pwinckles commented 2 years ago

Yep, that's the bug. It's not sorting the datastream versions after it reads them from the FOXML and is relying on them appearing in the correct order. Thanks for digging into that. I can get that patched next week.

edsu commented 2 years ago

That would be great! Since our repository system allowed hand-editing of datastreams our data might exercise lots of corner cases.

I was just experimenting with the foxml export since I had it handy, but if we were to use migration-utils it would likely be with the fcrepo3 filesystem. Is that code path affected by this ordering too?

pwinckles commented 2 years ago

@edsu This PR should fix this problem: https://github.com/fcrepo-exts/migration-utils/pull/182

Are you able to try it?

edsu commented 2 years ago

Yes, it worked! I see the object that failed above was able to be written. Feel free to close this or leave open if you want to use it as a reminder to release the fix.