OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

NullPointerException during PageXMLWriter.editExistingPageXML #277

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

While testing the dev branch's PAGE editing (by opening a file, changing some TextRegion/@type and saving it), I received an Exception:

java.lang.NullPointerException
    at de.uniwue.web.io.PageXMLWriter.mergeElementChangesIntoLayout(PageXMLWriter.java:261)
    at de.uniwue.web.io.PageXMLWriter.editPageLayoutFromResults(PageXMLWriter.java:165)
    at de.uniwue.web.io.PageXMLWriter.editExistingPageXML(PageXMLWriter.java:127)
    at de.uniwue.web.io.PageXMLWriter.getPageXML(PageXMLWriter.java:68)
    at de.uniwue.web.controller.FileController.exportXML(FileController.java:214)

The problematic code seems to be expecting that TextEquiv always has @index: https://github.com/OCR4all/LAREX/blob/8ccd341923c190198c4a12bb690953a2d078df8f/src/main/java/de/uniwue/web/io/PageXMLWriter.java#L261

But previous versions of Larex exported all my GT files such that there would be an empty first TextEquiv, followed by an @index=1 variant with the manual text:

        <TextEquiv>
          <Unicode/>
        </TextEquiv>
        <TextEquiv index="1">
          <Unicode>224,43</Unicode>
        </TextEquiv>

So can I consider this a regression, or do I have to fix all my files first? (The PAGE schema does allow the @index to be empty BTW.)

maxnth commented 3 years ago

So can I consider this a regression

Yes I fixed some other bug with the line you quoted and completely forgot about TextEquivs without an index. Should be an easy fix.

maxnth commented 3 years ago

Should be fixed by 4a79105