OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
38 stars 11 forks source link

Improve tessapi reset #179

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

This makes it possible to pass in parameters like user_words_file or user_patterns_file (even in the xpath_parameters way), which are only evaluated during model initialization.

It also removes the performance penalty cast by blanket per-segment reset in #175.

Plus it fixes a glitch in 318cefd that made all choices disappear when running textequiv_level=glyph.

codecov[bot] commented 3 years ago

Codecov Report

Merging #179 (11ef63f) into master (ab5f9d0) will increase coverage by 0.93%. The diff coverage is 34.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #179      +/-   ##
==========================================
+ Coverage   30.60%   31.54%   +0.93%     
==========================================
  Files          12       12              
  Lines        1382     1379       -3     
  Branches      321      317       -4     
==========================================
+ Hits          423      435      +12     
+ Misses        875      864      -11     
+ Partials       84       80       -4     
Impacted Files Coverage Δ
ocrd_tesserocr/recognize.py 31.52% <34.72%> (+1.55%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ab5f9d0...11ef63f. Read the comment docs.

bertsky commented 3 years ago

We should add tests for the (re-)initalization and xpath_parameters logic at some point.

Definitely. And then refactor to make the code readable again. I just don't have the time...