OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

500 (Internal Server Error) on Segmentation request #113

Closed novacellus closed 5 years ago

novacellus commented 6 years ago

The ajax call to "/segment" fails with a 500 error upon opening a newly uploaded book. The response header is:

HTTP/1.1 500 Internal Server Error Server: Apache-Coyote/1.1 Content-Type: text/html;charset=UTF-8 Content-Language: pl-PL Content-Length: 4122 Date: Sat, 14 Jul 2018 10:21:53 GMT Connection: close.

Previous requests (resize etc.) work fine and the error is being followed by another: "Uncaught TypeError: Cannot read property 'status' of undefined". Now, everything seems to be working just fine also with the test resource provided, so it may be effectively a server configuration or docker-related issue as I'm deploying Larex in a docker container.

The book folder contains a dozen of ~200 kb .tif files each with an accompanying PAGE XML file (the latter don't seem to be loaded by default, though).

request_header.txt request_payload.txt

Nesbi commented 6 years ago

Thank you, for bringing this to our attention.

Out of my head, I'd emagine that the server can't find or is unable to load the images.

Are you unable to load your book into Larex at all? The book should be deployed correctly if the book does show up with the correct name ("foldername" -> images[.jpg|.png|.tiff|.tif|.bmp) in the home/library.

Do the pages appear on the left side? If yes, are there pages that are possible to select and segment?

And what are your settings in the larex.config?

I presently never used Larex in a Docker container but I think it has been used with docker at out chair before.

Nesbi commented 6 years ago

(Closing the issue was a mistake. I've reopened it immediately)

novacellus commented 6 years ago

Out of my head, I'd emagine that the server can't find or is unable to load the images.

This was my first thought too. The thing was the book was loading correctly up to this point, the thumbnails got created and the web interface seemed to load as well: screenshot1

novacellus commented 6 years ago

I kept tracking the error and looked into Tomcat logs. I now know what the problem was: in the same folder as the tiffs there were also pre-created OCR Page XML files that I was forcing Larex to upload (I was thinking about using it to correct and analyse post-OCR segmentation). The error that the Tomcat throws is:

java.lang.IllegalArgumentException: No enum constant larex.regions.type.RegionType.

In fact, in my .xml there is no "semantic" region types (this was exactly what I needed Larex for) and some specification is seemingly required:

public enum RegionType.

A simple workaround would be to declare "other" as the value of type. But it also may be I'm trying to use Larex to what it wasn't created for in the first place. :-)

Anyway, I also attach my Dockerfile should someone find it useful. Dockerfile.txt

Nesbi commented 6 years ago

java.lang.IllegalArgumentException: No enum constant larex.regions.type.RegionType.

In fact, in my .xml there is no "semantic" region types (this was exactly what I needed Larex for) and > some specification is seemingly required:

Yeah that would make sense :)

I'll consult with @chreul next week about this.

Would you be so kind to share with us what your use case for LAREX is? :)

[My answer before your newest comment] Ok this roles out that the server is unable to find the book/images.

Your payload and header seem to be alright as well.

So something goes wrong at the segmentation step.

You said that "each [Image comes] with an accompanying PAGE XML file (the latter don't seem to be loaded by default, though)." What makes you thing they aren't loaded by default? The newest Version should load those page xmls by default.

novacellus commented 6 years ago

Setting RegionType type to "other" resolved the issue. The blocks get smoothly loaded!

Would you be so kind to share with us what your use case for LAREX is? :)

Of course, I'll pm you during the weekend. Thanks!

Nesbi commented 5 years ago

Larex now supports all PAGE Region types