OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Saving during Ground Truth Production is confusing #226

Closed b2m closed 3 years ago

b2m commented 3 years ago

I am in the Ground Truth Production step in OCR4All and switching between the modes Segmentation, Lines and Text.

When I forget to press save between switching the modes the saved data becomes inconsistent resulting in data loss.

Example steps to reproduce (reduced to minimum):

  1. Load Project
  2. Perform Preprocessing
  3. Go to Ground Truth Production
  4. Add segments (no save)
  5. Add lines (no save)
  6. Add text and save
  7. Check Page XML => no Page XML created.

Expectation: I would expect either a automatic save or forced save (changes present) between mode switching. Alternatively pressing save e.g. in Text mode should also save changes from Line and Segment mode.

Version: Larex was used via docker and the current ls6uniwue/ocr4all:latest.

maxnth commented 3 years ago

Thanks for these bug reports, they're really helpful as most users/testers had semi automatic use cases and most likely just used the Lines tab for altering/removing/adding lines after the automatic line segmentation was applied, so these kind of bugs flew under our radar until now.

This specific bug is triggering because LAREX – for some reason – expects a set reading order for text lines and is crashing when no line reading order exists. This obviously shouldn't happen and I'll look into fixing this ASAP.

b2m commented 3 years ago

Well, I also would prefer (semi-)automatic use cases =)

But to understand how things work (or why they are not working) you sometimes have to try simple things.

Also most of the Tools (OCR4All, OCR-D*) are developed with a focus on libraries and therefore books. In the archive we have a more diverse selection of material to process.

b2m commented 3 years ago

I was trying to reproduce the error using the current ls6uniwue/ocr4all:nightly docker image containing

As described the reported bug is no longer reproducible, and therefore considered fixed.

Thank you =)