Closed joschrew closed 1 year ago
Indeed, this pattern is a problem.
if hasattr(self, 'workspace'):
We could easily rewrite this by testing for output_file_grp
instead of workspace
AFAICS. (It's just there to differentiate between the other, non-processing CLI contexts, like dump and help.)
The only solution I see is making workspace
a property in the processor-base-class. And then override the setter in ocrd_tesserorc and transfer the newly set workspace to the wrapped instance of TesserocrSegmentRegion as well. But I don't know if that can be done in ocr-d-core without problems.
The only solution I see is making
workspace
a property in the processor-base-class. And then override the setter in ocrd_tesserorc and transfer the newly set workspace to the wrapped instance of TesserocrSegmentRegion as well.
I don't understand how that would help. The kwarg transfer needs to happen in the constructor already.
Within the new paradigm of the changed processor API, differentiating between processing and non-processing context becomes painless. (The former can happen in setup
, including all the parameter kwarg transfers.)
But for the moment, as I said, exchanging workspace
for output_file_grp
test should be safe. (And of course, passing *args
indiscriminately, not self.workspace
explicitly.)
I don't understand how that would help. The kwarg transfer needs to happen in the constructor already.
That's the problem I am trying to describe. At the constructor we don't have the workspace yet because we are creating a cached instance. We want to set the workspace later after the constructor has finished. But when we do that (setting the workspace) the wrapped instance of TesserocrRecognize
does not get updated with the "new" workspace.
With the code snipped (ocrd_tesserocr/ocrd_tesserocr/segment_region.py) I only wanted to show that the processor wraps an instance of TesserocrRecognize which does the real work. The if hasattr(self, 'workspace')
is not a problem when the workspace is set to None
with the first positional argument.
But when we do that (setting the workspace) the wrapped instance of
TesserocrRecognize
does not get updated with the "new" workspace.
Ah, sry, I didn't see it! Yes, since we don't override the constructor, but construct and then wrap a separate processor object, many patterns don't work. So I would say this trick is bad practice and should be ruled out in the future.
IIRC I used that wrap pattern instead of overriding the constructor because I did not want the user to see the processor appear like ocrd-tesserocr-recognize (with all its complicated parameters). But the downsides are obviously too great.
I'll make a PR based on the simpler pattern.
The only solution I see is making
workspace
a property in the processor-base-class. And then override the setter in ocrd_tesserorc and transfer the newly set workspace to the wrapped instance of TesserocrSegmentRegion as well.
Yes, that would work, and maybe it's also the best pattern in general going forward for all processors. (The setter could do many other things, like syncing/closing log files, releasing locks, syncing the METS cache or METS server.)
In the processing server pull request we want to use caching. Therefore an instance of a processor without a workspace is created. Later when the user requests a processor-run we want to set the workspace into that cache-instance.
The problem seems to be that ocrd-tesserocr processors delegates the processing internally to the processor TesserocrRecognize`: https://github.com/OCR-D/ocrd_tesserocr/blob/c10f94d2b36815314687a216495e8602b4a16bb0/ocrd_tesserocr/segment_region.py#L13-L32
The workspace is transferred to this class on creation. If the workspace is (re-)set afterwards it does not get delegated to the created instance of
TesserocrRecognize