OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Opening tif returns Error code: 500 #216

Closed helkejaa closed 3 years ago

helkejaa commented 3 years ago

Opening a book which contains at least one tif file results in error code 500. I'm quite unsure whether this is a feature of the system or a problem in my own system. Opening png or jpg works fine. For comparison (although possibly not comparable feature), having an xml corresponding a picture file in alto format results in the book opening but with buffering sign on the relevant picture (that is, book opens, no 500).

Otherwise I love the system. Handling tifs should be relevant since the format is widely used in various scanning and photo equipment.

maxnth commented 3 years ago

I'm quite unsure whether this is a feature of the system or a problem in my own system

It's indeed a problem with LAREX, I was able to reproduce it as well. This is most likely happening because ImageIO for Java 8 doesn't natively support reading TIFF files. Starting with Java 9 TIFF support was added but as we're most likely going to migrate from Java 8 to Java 17 sometime in the future (so jumping to Java 9 isn't really a viable option for us). I'll look into adding TIFF support some other way (e.g. using JAI) for the upcoming release.

having an xml corresponding a picture file in alto format results in the book opening but with buffering sign on the relevant picture

We currently only support PAGE XML so LAREX can't load any annotations from the ALTO file and this leads the infinite buffering sign. Adding some kind of warning in cases like this and the possibility to discard "invalid" XML files in the GUI would probably make sense, we'll look into it.

maxnth commented 3 years ago

TIFF support was added in 1074420a1b2ee6284de45b5ad9e8933832075ecf

maxnth commented 3 years ago

Invalid XML files will now just lead to an error message in the GUI and won't "freeze" LAREX with the infinite buffering icon anymore (added in 4e0208c3de1f08e09e3b3e4e010fce88fe4b42e0)