OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
119 stars 31 forks source link

document initLogging requirement #621

Open bertsky opened 4 years ago

bertsky commented 4 years ago

Using any of core's API (even just ocrd_utils) now requires calling initLogging at some point prior to usage, or the respective subpackage's getLogger will complain with CRITICAL.

So not only is it not correct anymore to do ocrd_utils.getLogger at module level, any program or library using core must also do ocrd_utils.initLogging during its initialization (but not necessarily on the module level) now.

This must be reflected somewhere in the documentation prominently.

M3ssman commented 4 years ago

Three thumbs up! Without this call any logging records in any other application that uses stuff from ocrd-core gets dropped. Why become all loggers silently PropagationShyLoggers? All our logs vanished after updating from 2.16 (behavior changed from v2.16 to v2.17, as I realized just by now).

bertsky commented 4 years ago

Without this call any logging records in any other application that uses stuff from ocrd-core gets dropped. Why become all loggers silently PropagationShyLoggers?

Sorry to hear this hit you in the back.

I don't understand what you say there exactly, though. So you have a program that imports (some of) core's packages, but does not use the Processor CLI decorator (which brings ocrd_utils.initLogging), and that program does not give you any log messages at all anymore, or just not the OCR-D log messages?

Also, what's a PropagationShyLogger?

M3ssman commented 4 years ago

Within our main migration workflow scripts I do use some functionalities of OCR-D core for workspace preparation and so on (e.g. : https://github.com/OCR-D/ocrd-website/wiki/How-to-create-searchable-Fulltext-Data-for-DFG-Viewer).

PropagationShyLogger is introduced from core/blob/master/ocrd_utils/ocrd_utils/logging.py, on the line with logging.setLoggerClass(PropagationShyLogger). This also turns any other logger from an app that uses ocrd_core-library into this type. If you stay in ocrd-realms, you might not even bother. I wonder if this was intended ... but it seems not to be that harmfull anyway, since it was already introduced back in april when we first reviewed the logging module.

This one 4ce207ee8 ocrd_utils/ocrd_utils/logging.py (Konstantin Baierer 2020-09-11 17:06:03 +0200 165) disableLogging() effectively turns off any logger in any application that imports ocrd-stuff. This leads to situations where empty log-files are being written. We really need to keep in mind this is actually breaking behavior.

The reason to introduce this was to have a short-cut to silence logging from all 3rd-party-libs ocrd uses ... ?

kba commented 4 years ago

The reason to introduce this was to have a short-cut to silence logging from all 3rd-party-libs ocrd uses ... ?

The reason for doing this was to prevent libraries like tensorflow non-processing-related logging in processors from interfering with sys.stdout/sys.stderr, so as not to break output like --dump-json.

This also turns any other logger from an app that uses ocrd_core-library into this type.

That was intentional and this behavior, that PropagationShyLogger is set as the default logger, has been around a while. What is new is that in the absence of initialization, logging is completely disabled. I can see how this can be a problem, sorry for the inconvenience.

Where should I document this?

M3ssman commented 4 years ago

If the main reason to turn off logging totally is the behavior of tensorflow, question: It is possible to integrate tensowflow into the default logging conf file? If so, I'd prefer this, since this too documents that there's a problem with tensorflow logging if left alone.

kba commented 4 years ago

It is possible to integrate tensowflow into the default logging conf file?

We should probably mirror the builtin defaults to the logging.conf, i.e.

        logging.getLogger('').setLevel(logging.INFO)
        #  logging.getLogger('ocrd.resolver').setLevel(logging.INFO)
        #  logging.getLogger('ocrd.resolver.download_to_directory').setLevel(logging.INFO)
        #  logging.getLogger('ocrd.resolver.add_files_to_mets').setLevel(logging.INFO)
        logging.getLogger('PIL').setLevel(logging.INFO)
        # To cut back on the `Self-intersection at or near point` INFO messages
        logging.getLogger('shapely.geos').setLevel(logging.ERROR)
        logging.getLogger('tensorflow').setLevel(logging.ERROR)

Maybe we should just move to using the builtin logging.conf if no logging.conf is found in the filesystem?

bertsky commented 4 years ago

If the main reason to turn off logging totally is the behavior of tensorflow, question: It is possible to integrate tensowflow into the default logging conf file?

No, not just TF. Any library that is imported by processors and already logs things at module init time is problematic for CLIs that want to control exactly what appears on stdout. We've also seen pyplot do this and some others (don't recall).

Maybe we should just move to using the builtin logging.conf if no logging.conf is found in the filesystem?

I don't really see a connection to the issue at hand, and I think it's useful to not rely on filesystem access for the defaults.

Maybe we should first try to pinpoint exactly what @M3ssman needs to import from core without breaking other logging components. For our processors, we have initLogging via CLI decorator. For the ocrd CLI we do it explicitly. In these contexts killing unwanted logging is sometimes necessary.

Maybe the problem is that we require initLogging when using getLogger?

M3ssman commented 4 years ago

@kba So we should extend the default configuration with, say

[loggers]
keys=root,ocrd_tensorflow

...

[logger_ocrd_tensorflow]
level=ERROR
handlers=consoleHandler
qualname=tensorflow

and so on?

kba commented 1 year ago

image

@bertsky:

Should be removed now IMO. (If you need to experiment on this, just configure the logger's level it in your local ocrd_logging.conf.)

@MehmedGIT:

That's what I tried but without success. When I use the ocrd_logging.conf file then it overwrites and suppresses all potential logs coming from loggers not mentioned in that configuration file. For example, uvicorn server logs. That's why we are using python logging here. @kba is working on improving the ocrd logging. Then we can remove python logging completely.

@bertsky

You can always configure default loggers (no qualname) with suitable level and handlers. And specific loggers, too. For "rogue" libraries that spam the root logger, there are workarounds. But for Uvicorn in particular, have a look at the log_config kwarg of its run method.


Posting here, since this is related, I really need to clean up the logging setup and re-align the standard-logging of #974 with the rest of the code base.