Closed kba closed 2 years ago
This sounds very convincing to me. Except for one problem: (correct me if I am wrong, but) page segmentation usually refers to finding regions, not the border. It would make more sense to call that region segmentation, just as line segmentation creates lines, (so page segmentation would indeed be free for what we used to call cropping), but I never heard that.
That's actually what we (@cneud and @kba and me) agreed on: To prefix segmentation with the result and not with the level of operation (i.e. segment image into X). You are absolutely right that page segmentation usually refers to segmentation of the page. But I prefer principle and sound solutions over traditions. 😁
It is definitely a stumbling point for newcomers and users, but I am skeptical whether researchers can be convinced easily to adopt that change terminology. (In the least, page segmentation would have to be disambiguated verbosely for a while.)
Another established term is page frame detection. This already distinguishes itself from the physical operation (of cropping / cutting). So it might be a compromise (and smaller deviation from tradition) to use cropping only as an image operation (not a workflow step) in OCR-D, and consistently use page frame detection for the process of finding Border
. As an extra, one could also refrain from using page segmentation and (provocatively but unambiguously) use region segmentation instead.
It is a pity that the PAGE element is called Border
. Maybe we should go with border_detection
on the operation levels page
, region
and line
.
It is a pity that the PAGE element is called
Border
. Maybe we should go withborder_detection
on the operation levelspage
,region
andline
.
You mean instead of segmentation?
To prefix segmentation with the result and not with the level of operation (i.e. segment image into X).
But that (new) principle could still not be applied for page segmentation (in the new sense): Border
detection does not actually segment the source image. So even with region segmentation established, I do not see a place for page segmentation, except in a broader sense covering all levels of segmentation.
Yeah! That's why I propose a completely new wording:
ocrd_tesserocr_detect_border -I ORIGINAL -O CROPPED -m mets.xml -p <(echo '{"operation_level": "page"}')
ocrd_tesserocr_detect_border -I CROPPED -O SEGMENT_REGION -m mets.xml -p <(echo '{"operation_level": "region"}')
ocrd_tesserocr_detect_border -I SEGMENT_REGION -O SEGMENT_LINE -m mets.xml -p <(echo '{"operation_level": "line"}')
I.e. foregoing the new principle.
I see. But the last 2 steps (region and line segmentation) do not actually detect any borders (i.e. outer limits) of regions and lines, they rather define those very regions and lines. IMHO we have no good reason to drop the term segmentation itself at this point.
Also, we should probably not concern ourself much with the names of components or processors here – as these need to accomodate other considerations (like using imperative verb forms instead of abstract nouns, e.g. recognize
for OCR, correct
for OCR post-correction, rate
for LM rescoring, or being true to the implementation rather than the general operation they offer) – as much as with the terms we use to describe the workflow steps in our documentation.
That being said, I don't find the existing naming scheme of ocrd_tesserocr all that bad – although I wouldn't mind a slight change like so:
ocrd-tesserocr-crop-page -I OCR-D-IMG -O OCR-D-SEG-PAGE
ocrd-tesserocr-segment-regions -I OCR-D-SEG-PAGE -O OCR-D-SEG-BLOCK
ocrd-tesserocr-segment-lines -I OCR-D-SEG-BLOCK -O OCR-D-SEG-LINE
@bertsky Is there still something to do from this discussion?
Is there still something to do from this discussion?
Hard to summarise, even harder to reach an agreement at this point.
We have:
We need to accomodate:
I'm afraid we cannot re-invent the wheel here, or just ignore existing terminology in the academic literature or in the field.
I suggest sticking to page frame detection
when necessary to disambiguate over cropping
, trying to avoid cropping
as a general image operation, keeping the idiomatic page segmentation
as a segmentation of pages into regions and line segmentation
as a segmentation of regions into lines, but disambiguating further when necessary, and documenting all this in the glossary and specs.
In the docstrings,
cropping
currently refers to tasks that could be better described as segmenting (finding regions) or cutting (doing the actual image manipulation).This came up in #268 but finding the right terminology should not prevent a merge.
We should also extend the glossary.
Here's the pertinent comments on the terms:
@bertsky:
@wrznr: