GerHobbelt / W

tracking bugs, caveats, reminders and ramblings in and of my public clones/forks
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

tesseract:: retry OCR with different config sets #15

Open GerHobbelt opened 3 months ago

GerHobbelt commented 3 months ago

alternative and more powerful implementation of the current (hacky) retry_config argument:

Purpose:

GerHobbelt commented 3 months ago

The idea here being:

using a tesseract parameter to point at a prescription file which lists alternative runs to try, one set per line, e.g.

# retrial prescription
# 
#   can specify simple parameters directly by assignment; other words on the line are config file references
#

lang=eng    psm=3     config_A
lang=eng+lat+fra+deu   psm=3     config_A
psm=1    config_debug_all
psm=7
psm=11
psm=13

NOTE: I'm still pondering whether a subsequent line/set should augment/replace a previous line's settings, i.e. not revert to factory default before processing the line, OR start clean every time (reset to factory default) on each round. The latter is more obvious, while the former makes for tighter set lists as we only need to specify what needs tweaking for the new round... 🤔

GerHobbelt commented 3 months ago

Incidentally, this is another argument why I want every option in tesseract to be steerable/configurable via parameter: using CLI and direct C++ API interfacing should be equally powerful & easy at the same time.