OCR-D / ocrd_calamari

Recognize text using Calamari OCR and the OCR-D framework
Apache License 2.0
13 stars 6 forks source link

Calamari segfaults #44

Closed Witiko closed 3 years ago

Witiko commented 4 years ago

Using the ocrd/maximum docker image from 5 days ago (2020-09-18, 9165ddaf96bc), I am receiving a segfault when running ocrd-calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -p checkpoint /models/\*.ckpt.json, where the models are those from the model.tar.xz archive as suggested by the OCR-D project web site. Only 5 xml files are produced before the segfault, so I am assuming the issue is with the sixth image (OCR-D-SEG-LINE-RESEG-DEWARP_f103.xml) and its line segments. Would you like me to submit the problematic line segments for reproduction?

I noticed that a new ocrd/maximum image has been published a day later. Do you suppose the changes may affect calamari?

kba commented 4 years ago

Thanks for the report, data to reproduce and fix this is much appreciated.

There was no update of ocrd_calamari since last week.

Witiko commented 4 years ago

@kba For ease of reproduction, I uploaded mets.xml, OCR-D-SEG-LINE-RESEG-DEWARP/ and OCR-D-OCR/ in the state after the segfault as a ZIP archive here. I also added information about the calamari models I used to the original post. Please, let me know if you required any more information or any action on my part for reproduction.

mikegerber commented 3 years ago
  1. The METS file is a bit confusing, it contains in the OCR-D-SEG-PAGE filegroup:

All in one file group. Could you provide the workflow you used to produce this?

  1. Also, mets.xml does not contain the input file group in question, so I cannot reproduce:
Input fileGrp[@USE='OCR-D-SEG-LINE-RESEG-DEWARP'] not in METS!
Witiko commented 3 years ago

I am not able to reproduce the issue.

mikegerber commented 3 years ago

Alright, I am closing this issue then. Please re-open and send me/upload the data if you have the same problem again!