Open stweil opened 4 years ago
Hi, Thanks for pointing this out. Do you have example files handy?
Sure, here is an example: https://ub-backup.bib.uni-mannheim.de/~stweil/prima-page-converter-issue-15/.
I just added a minus to one of the coordinates to make the conversion fail, even with the latest release.
This is because the converter doesn't load invalid XMLs. The exception is thrown because the page object is null
This is because the converter doesn't load invalid XMLs
The samples @stweil posted are valid PAGE 2019.
Yes, but I explained above how to make them invalid by adding a minus which triggers the crash. We had negative coordinates in earlier releases of OCR-D.
I mean that https://ub-backup.bib.uni-mannheim.de/~stweil/prima-page-converter-issue-15/FILE_0006_OCR-D-OCR-TESS-bad.xml does have a negative coordinate in region region0003_line0001_word0003
but is still valid according to the schema, so
the converter doesn't load invalid XMLs
does not seem to answer the question.
Here an Example created with Abbyy Finereader SDK which gives NullPointerException:
https://digi.ub.uni-heidelberg.de/diglitData/v/justinian1627bd1_-_0009.abbyy.xml
> java -jar ~/ocr-fileformat/vendor/JPageConverter/PageConverter.jar -source-xml 0009.line.xml -target-xml 0009.page.xml -neg-coords toZero
Exception in thread "main" java.lang.NullPointerException
at org.primaresearch.dla.page.converter.PageConverter.handleNegativeCoordinates(PageConverter.java:449)
at org.primaresearch.dla.page.converter.PageConverter.run(PageConverter.java:266)
at org.primaresearch.dla.page.converter.PageConverter.main(PageConverter.java:161)
minimalistic text-only tool from me: https://gist.github.com/jbarth-ubhd/4826031b9de3b9c394be0da40bee14b6
PageConverter
crashes when given a negative coordinate even with-neg-coords toZero
:The same exception also occurs with PAGE XML input which has no
TextRegion
but an emptyReadingOrder
. That is not valid PAGE XML, but could perhaps be tolerated, too.