OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
39 stars 11 forks source link

resource_manager already causes initLogging #169

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

In the current implementation of the resmgr based resolution of the TESSDATA_PREFIX

https://github.com/OCR-D/ocrd_tesserocr/blob/fd173868d20154068d9e659c57a09c2b55a4a9bc/ocrd_tesserocr/config.py#L10-L14

… there's a module-level instantiation of OcrdResourceManager, whose constructor does a getLogger, which in turns calls initLogging.

Thus,

CRITICAL root - getLogger was called before initLogging. Source of the call:
CRITICAL root -   File "venv/lib/python3.6/site-packages/ocrd/resource_manager.py", line 26, in __init__
CRITICAL root -     self.log = getLogger('ocrd.resource_manager')
CRITICAL root - initLogging was called multiple times. Source of latest call:
CRITICAL root -   File "venv/lib/python3.6/site-packages/ocrd/decorators/__init__.py", line 49, in ocrd_cli_wrap_processor
CRITICAL root -     initLogging()

Given that we need that directory in all processors, but only to setup processing (i.e. after init / before process), how about merely exporting this as a function in config.py

def get_path():
    if 'TESSDATA_PREFIX' in os.environ:
        return os.environ['TESSDATA_PREFIX']
    else:
        location = OcrdResourceManager().default_resource_dir
        return join(location, 'ocrd-tesserocr-recognize')

…and then in each processor, do …

    def __init__(self, *args, **kwargs):
        kwargs['ocrd_tool'] = OCRD_TOOL['tools'][TOOL]
        kwargs['version'] = OCRD_TOOL['version'] + ' (' + tesseract_version().split('\n')[0] + ')'
        super(TesserocrRecognize, self).__init__(*args, **kwargs)

        if hasattr(self, 'workspace'):
            setup()

    def setup(self):
        self.logger = getLogger('processor.TesserocrRecognize')
        self.tessdata = config.get_path()
kba commented 3 years ago

Good point and thanks for the solution, I'll make a PR for it.