Open bertsky opened 2 months ago
To be precise, in this case the message from TF logging causing the hickup was not a warning (as the issue suggests) but an error. All our TF-based processors now suppress log messages below that level anyway. But my point is about log messages from imported modules in general, and if we find a solution here, then these special TF rules may not even be necessary anymore.
So perhaps we should do our own
basicConfig(handlers=[logging.NullHandler()], force=True)
, or alternatively just calllogging.disable(sys.maxsize)
in core'socrd_cli_wrap_processor
(which will be before any processor-specific imports)?
Duh! That by itself won't suffice, of course: when the processor wrapper gets used, TF usually is already imported and thus initialised.
So perhaps we should make sure that our processors don't make imports other than OCR-D related – until required to do process()
. That would also speed up responses for --list-resources
, --dump-json
, --help
a lot – TF and Pytorch take quite some time.
So perhaps we should make sure that our processors don't make imports other than OCR-D related
…which in the case of ocrd_calamari is tricky: it wants to from tensorflow import __version__
to show that in the --version
(as well as in the METS agent entry after processing).
In another example, I get garbled JSON from the following error:
/ocrd_all/venv38/lib/python3.8/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (5.2.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Regarding the urllib3
problem – for anyone who encounters this, the fix is to update requests
as specified by core:
pip install -U "requests<2.30"
So perhaps we should make sure that our processors don't make imports other than OCR-D related – until required to do
process()
. That would also speed up responses for--list-resources
,--dump-json
,--help
a lot – TF and Pytorch take quite some time.
We should implement this along with the #322 API changes – so we don't need touch the processor code bases twice.
…which in the case of ocrd_calamari is tricky: it wants to
from tensorflow import __version__
to show that in the--version
(as well as in the METS agent entry after processing).
In this case it is still possible to initialise logging before importing Tensorflow, so even that should not be a problem.
As to the logging setup itself – it would not even have to be logging.disable(...)
or basicConfig
with no handlers or a null handler: AFAICS it would be fully sufficient if we did our normal initLogging
prior to all other imports – as long as our ocrd_logging.config separates logging from stdout.
We have…
So because TF spew a warning about a CPU mismatch, the
--dump-json
output is broken andocrd process
cannot validate the workflow.IMO this likely affects all processors in one form or another. When we instantiate a processor in the CLI decorator, there is no
initLogging
call for the--dump-json
target, but TF can still do its own basicConfig, which will use stdout.So perhaps we should do our own
basicConfig(handlers=[logging.NullHandler()], force=True)
, or alternatively just calllogging.disable(sys.maxsize)
in core'socrd_cli_wrap_processor
(which will be before any processor-specific imports)?