OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
118 stars 31 forks source link

regression from #1238: setup called in disposable processor instance #1255

Closed bertsky closed 2 months ago

bertsky commented 3 months ago

Example: https://app.circleci.com/pipelines/github/bertsky/ocrd_cis/6/workflows/8ce0fe22-dfdf-446d-bf16-96d26523b31c/jobs/11

Explanation:

  1. in #1238 we had to invent a "disposable" processor instance to first try to resolve preset files (including from the module directory) https://github.com/OCR-D/core/blob/79c61e303c87f229d5c96aedc0da31ef82b0f5d3/src/ocrd/decorators/__init__.py#L77-L84
  2. we could not rely on any constructor method other than full processing mode, because that's the only one which does parameter init
  3. however, that means all the attributes which are characteristic of processing mode have been set (workspace, output_file_grp, parameter etc)
  4. so all our 2.x that so far had to rely on this hasattr check to determine whether their setup should be called will now fire twice https://github.com/bertsky/ocrd_cis/blob/eb4efe1e7bd21e9a1cf5d7c18dcca4d868a92f66/ocrd_cis/ocropy/recognize.py#L91-L93
  5. processors that have required parameters without default will fail in that disposable step

There's next to no way of checking the true status from the inheriting processors, also I would really like to avoid touching them. So this should be rectified in core IMO.

Ideas?

bertsky commented 3 months ago

2. we could not rely on any constructor method other than full processing mode, because that's the only one which does parameter init

Actually, I don't remember why we should need a parameter initialisation.

We could do with some phony init (like show_version). If only resolve_resource was a class method...

How about cherry-picking https://github.com/OCR-D/core/pull/1240/commits/9827c4d18d42f36a94c65621442be29a98e7254e and making resolve_resource a class method?

bertsky commented 3 months ago

and making resolve_resource a class method?

nah, that would not work – we do have state in that function: if the constructor did a chdir to the workspace, then in order to resolve CWD location resources, we have to chdir back.