In OCR-D, long ago we moved away from absolute filenames and file:// refs in FLocat.
When calling de.lmu.cis.ocrd.cli.PostCorrectionCommand with an absolute path to the METS, it runs through, but produces output FLocats with absolute paths, which is (now) incorrect.
But when calling with just mets.xml inside the workspace directory, the postprocessor crashes:
22:32:03.614 DEBUG cis.PostCorrectionCommand - loading page
java.lang.NullPointerException
at de.lmu.cis.ocrd.pagexml.METS$File.openLocalPath(METS.java:175)
at de.lmu.cis.ocrd.pagexml.METS$File.openInputStream(METS.java:161)
at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.getPages(METSFileGroupReader.java:41)
at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.eachWord(METSFileGroupReader.java:54)
at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.getBaseOCRTokenReader(METSFileGroupReader.java:77)
at de.lmu.cis.ocrd.pagexml.Workspace.getBaseOCRTokenReader(Workspace.java:33)
at de.lmu.cis.ocrd.cli.ParametersCommand.getProfile(ParametersCommand.java:92)
at de.lmu.cis.ocrd.cli.ParametersCommand.getProfile(ParametersCommand.java:61)
at de.lmu.cis.ocrd.cli.PostCorrectionCommand.predictRankings(PostCorrectionCommand.java:96)
at de.lmu.cis.ocrd.cli.PostCorrectionCommand.postCorrect(PostCorrectionCommand.java:61)
at de.lmu.cis.ocrd.cli.PostCorrectionCommand.execute(PostCorrectionCommand.java:37)
at de.lmu.cis.ocrd.cli.Main.run(Main.java:33)
at de.lmu.cis.ocrd.cli.Main.main(Main.java:9)
In OCR-D, long ago we moved away from absolute filenames and
file://
refs in FLocat.When calling
de.lmu.cis.ocrd.cli.PostCorrectionCommand
with an absolute path to the METS, it runs through, but produces output FLocats with absolute paths, which is (now) incorrect.But when calling with just
mets.xml
inside the workspace directory, the postprocessor crashes:The reason is simply that when opening input files via
METS.File.openLocalPath
, the first reference https://github.com/cisocrgroup/ocrd-postcorrection/blob/49decc4b9b2f38a16c49ff3b3be36a708a4d5077/src/main/java/de/lmu/cis/ocrd/pagexml/METS.java#L175 is null, because the file instance gets created in https://github.com/cisocrgroup/ocrd-postcorrection/blob/49decc4b9b2f38a16c49ff3b3be36a708a4d5077/src/main/java/de/lmu/cis/ocrd/pagexml/METS.java#L69 which expands to null for the parent of the relative pathmets.xml
.So IMO the best fix would be to replace https://github.com/cisocrgroup/ocrd-postcorrection/blob/49decc4b9b2f38a16c49ff3b3be36a708a4d5077/src/main/java/de/lmu/cis/ocrd/pagexml/METS.java#L102 with the current working directory if
workspace
is indeed empty.