OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
38 stars 11 forks source link

Don't raise exception for empty intersection (fix issue #139) #141

Closed stweil closed 4 years ago

stweil commented 4 years ago

Signed-off-by: Stefan Weil sw@weilnetz.de

codecov[bot] commented 4 years ago

Codecov Report

Merging #141 into master will decrease coverage by 0.03%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #141      +/-   ##
==========================================
- Coverage   37.77%   37.73%   -0.04%     
==========================================
  Files           9        9              
  Lines         998      999       +1     
  Branches      212      212              
==========================================
  Hits          377      377              
- Misses        555      556       +1     
  Partials       66       66              
Impacted Files Coverage Δ
ocrd_tesserocr/segment_region.py 53.28% <0.00%> (-0.36%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a370312...bff7dcc. Read the comment docs.

kba commented 4 years ago

But if the child element is outside the parent element, that is an error that should be adressed. I agree that raising an exception shifts the responsibility for debugging to the user, that is not ideal. But this should be debugged nonetheless.

stweil commented 4 years ago

But this should be debugged nonetheless.

Sure, that's why I added LOG.error. But you are right, this is not a fix but a workaround, so issue #139 should be kept open even with this commit.

stweil commented 4 years ago

But if the child element is outside the parent element, that is an error that should be adressed.

Maybe both parent and child are empty, then the child is not outside, but the same error is triggered.

bertsky commented 4 years ago

Introducing this exception was a conscious decision for the current state of coordinate consistency in OCR-D.

We know that most processors currently write illegal coordinates sometimes, which ultimately fails at validation time with lots of hard to debug or trace or visualise error messages. But shifting the burden for correct coordinates from validation to implementation is a large effort, and we need core to support this as much as possible. That's where this idea will step in.

But looking at the example in #139 I can now see that this will happen much more frequently than I thought in ocrd_tesserocr. So yes, we need to offer some workaround as well. I'll have to think about which is preferable: fixing this instance of #139 directly or downgrading to an error message...

stweil commented 4 years ago

Issue #139 is fixed now with a better solution, so closing this pull request.