Open mikegerber opened 4 years ago
ocrd-segment-repair has the optional operations "plausibilize" and "sanitize" – I have no idea what this exactly does :)
I agree, these are not expressive enough, or even memorable (which is what...)
I would prefer something like this:
* shrink-regions-to-hull-of-lines
...or just shrink-regions
?
* whatever-plausibilize-does
ATM all it does is remove regions fully contained by others or nearly equal to them (and fix the ReadingOrder
afterwards).
It's intended to become much more though, like merging or shrinking overlapping neighbouring regions, or fixing reading order via basic heuristics (e.g. no arbitrary jumps back and forth).
Since this processor started out under the name repair
but received a default behaviour of just warning about likely errors, we needed some verb for the actual action.
Maybe separate-neighbours
?
@wrznr?
Right, they have very common names since they are intended to do various things. Right now, they do not do very much and are not ready for productive use or even testing. I would rather keep the current names and see what the processors will become. Let us discuss about a proper name when implementation and documentation are finished. (ocrd_segment
will be my main focus in December)
Related: qurator-spk/ocrd_repair_inconsistencies#2
Documentation from https://ocr-d.de/en/workflows:
Documentation from https://ocr-d.de/en/workflows:
- plausibilize = Remove redundant (almost equal or almost contained) regions, and merge overlapping regions
- sanitize = Shrink and/or expand a region in such a way that it coordinates include those of all its lines
This is actually from the ocrd-tool json description of these parameters, see ocrd-segment-repair -h
ocrd-segment-repair has the optional operations "plausibilize" and "sanitize" – I have no idea what this exactly does :) I would prefer something like this:
There seems to also be another thing ocrd-segment-repair does.
In other words: Make operations explicit.