Closed bertsky closed 4 years ago
to be honest we have been considering removing this module completely. the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region. in our recent version of the block-segmentation model we added more regions like graphic, image and table regions, so it should now cover non-text regions as well
in the case of non-text region we also wouldnt be clear which region to add to the page xml. there is a text-region that covers all types of text regions, but no equivalent non-text region, but rather separate graphic, image and table regions.
to be honest we have been considering removing this module completely.
If this is anything like the tiseg in ocropus/OLD
, then IMO it should be kept, for the sake of completeness. It needs a good wrapper/documentation though.
the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region.
How can that be. The DFG call for OCR-D explicitly mentions text/image segmentation in Modul 2: Layouterkennung
/ Teilaufgabe 2.A Seitensegmentierung
. One of your 2 MPs runs on that ticket.
in our recent version of the block-segmentation model we added more regions like graphic, image and table regions, so it should now cover non-text regions as well
That's a different model. It might work better under some circumstances, but worse under others.
Moreover, it's a different method (coordinates/segmentation instead of image/clipping). It might be better suited for some workflows, but worse for others.
in the case of non-text region we also wouldnt be clear which region to add to the page xml. there is a text-region that covers all types of text regions, but no equivalent non-text region, but rather separate graphic, image and table regions.
The point of the clipping approach would be to not add any regions at all, but rather provide a different binarization which suppresses all the non-text foreground. Then a "normal" segmentation processor can run on this for text and table regions. (In principle, you could also provide the inverse, an image with the text foreground suppressed. Then a specialised segmentation processor could try to segment separator lines, images, graphics etc.)
the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region.
How can that be. The DFG call for OCR-D explicitly mentions text/image segmentation in Modul 2: Layouterkennung / Teilaufgabe 2.A Seitensegmentierung. One of your 2 MPs runs on that ticket.
Yes, but as you mentioned yourself previously, as a segmentation processor the output should be regions?
The point of the clipping approach would be to not add any regions at all, but rather provide a different binarization which suppresses all the non-text foreground. Then a "normal" segmentation processor can run on this for text and table regions. (In principle, you could also provide the inverse, an image with the text foreground suppressed. Then a specialised segmentation processor could try to segment separator lines, images, graphics etc.)
This is exactly what it currently outputs. Two images, where one contains all text pixel (surpressing non-text pixel), the other contains all non-text pixel (surpressing the text pixel). So maybe there was a misunderstanding about what the expected output of such a module should and shouldn't be. A better documentation probably would have cleared this up earlier.
Yes, but as you mentioned yourself previously, as a segmentation processor the output should be regions?
If I said or implied that segmentation can only be done via structural annotation of layout elements, then I was wrong – sorry. PAGE-XML is quite liberal and could be interpreted to allow for AlternativeImage
as a preliminary representation of segmentation, too. The actual structural annotation with Coords
can then follow in an independent step/processor. The OCR-D call and functional model would allow that IIUC. Compare OCR-D/spec#120. Please consult with @wrznr @cneud @kba to verify or falsify.
In any case, I opened this issue merely to improve documentation (docstrings, tool json description, README), never to suggest abandoning this approach (although that's your decision of course).
So maybe there was a misunderstanding about what the expected output of such a module should and shouldn't be. A better documentation probably would have cleared this up earlier.
I concur. Maybe you can help improving documentation here, as well as give feedback for improvement of OCR-D/spec.
Updated the documentation in 6390f22
Updated the documentation in 6390f22
Thanks @mahmed1995. However, I am not at all satisfied with this:
mask
does not by itself convey the information that you are using the AlternativeImage
mechanism for this.clipped
feature to the (comma-separated) list of @comments
.originally we had both alternative images as output. I re-added the second one again in https://github.com/mjenckel/ocrd_anybaseocr/commit/b66825391b40044a1482e34f05ab7f7784b8042a Do you have a recommendation to indicate whether the image is text-clipped or img-clipped?
I specified the output format in the README in https://github.com/mjenckel/ocrd_anybaseocr/commit/c41294d0b8c12a751d730698e00993d173236635 As for the "method of segmentation" I am not 100% sure what you mean.
I updated all the ocrd-tool.json descriptions to indicate output format in https://github.com/mjenckel/ocrd_anybaseocr/commit/f65587d49ea3d001a6d598e14f3d3c3fbe4d1f2a
I also added clipped
to the comments in https://github.com/mjenckel/ocrd_anybaseocr/commit/10b14069eb0aaad94996058e4282806ca794091a, but with two output images, there should probably be an additional indication so later processor can filter for the right one? (Or go back to only one image, but then later processors cant use the image-part to further segment/classify it)
Excellent, thanks @mjenckel!
1. Do you have a recommendation to indicate whether the image is text-clipped or img-clipped?
Not really, no. Your current method of adding the text part as AlternativeImage (marked clipped
) but not the non-text part, while differentiating them in the METS via an ID suffix, is okay for the existing spec I guess. But maybe you can make a proposal (preferably as PR on OCR-D/spec, or as discussion issue) yourself. I guess one would at least have to introduce new @comments
classes (image features), e.g. something like text-only
and nontext-only
. I look forward to the discussion!
2\. As for the "method of segmentation" I am not 100% sure what you mean.
Your current formulation is sufficient IMO. (I was again referring to the expectation of getting region types with coordinates, which this binary-only method defeats.)
The name
tiseg
and its description in the README and tool json suggests this is a segmentation processor. But it does not add regions with coordinates, only a page image with suppressed images.This should be documented more clearly. Also, consider using the image feature
clipped
for your AlternativeImage.