OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
47 stars 12 forks source link

tiseg: improve documentation #31

Closed bertsky closed 4 years ago

bertsky commented 4 years ago

The name tiseg and its description in the README and tool json suggests this is a segmentation processor. But it does not add regions with coordinates, only a page image with suppressed images.

This should be documented more clearly. Also, consider using the image feature clipped for your AlternativeImage.

mjenckel commented 4 years ago

to be honest we have been considering removing this module completely. the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region. in our recent version of the block-segmentation model we added more regions like graphic, image and table regions, so it should now cover non-text regions as well

in the case of non-text region we also wouldnt be clear which region to add to the page xml. there is a text-region that covers all types of text regions, but no equivalent non-text region, but rather separate graphic, image and table regions.

bertsky commented 4 years ago

to be honest we have been considering removing this module completely.

If this is anything like the tiseg in ocropus/OLD, then IMO it should be kept, for the sake of completeness. It needs a good wrapper/documentation though.

the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region.

How can that be. The DFG call for OCR-D explicitly mentions text/image segmentation in Modul 2: Layouterkennung / Teilaufgabe 2.A Seitensegmentierung. One of your 2 MPs runs on that ticket.

in our recent version of the block-segmentation model we added more regions like graphic, image and table regions, so it should now cover non-text regions as well

That's a different model. It might work better under some circumstances, but worse under others.

Moreover, it's a different method (coordinates/segmentation instead of image/clipping). It might be better suited for some workflows, but worse for others.

in the case of non-text region we also wouldnt be clear which region to add to the page xml. there is a text-region that covers all types of text regions, but no equivalent non-text region, but rather separate graphic, image and table regions.

The point of the clipping approach would be to not add any regions at all, but rather provide a different binarization which suppresses all the non-text foreground. Then a "normal" segmentation processor can run on this for text and table regions. (In principle, you could also provide the inverse, an image with the text foreground suppressed. Then a specialised segmentation processor could try to segment separator lines, images, graphics etc.)

mjenckel commented 4 years ago

the text-image segmentation on a pixel level is not required by the project specifications, but rather you expect text regions and non-text region.

How can that be. The DFG call for OCR-D explicitly mentions text/image segmentation in Modul 2: Layouterkennung / Teilaufgabe 2.A Seitensegmentierung. One of your 2 MPs runs on that ticket.

Yes, but as you mentioned yourself previously, as a segmentation processor the output should be regions?

The point of the clipping approach would be to not add any regions at all, but rather provide a different binarization which suppresses all the non-text foreground. Then a "normal" segmentation processor can run on this for text and table regions. (In principle, you could also provide the inverse, an image with the text foreground suppressed. Then a specialised segmentation processor could try to segment separator lines, images, graphics etc.)

This is exactly what it currently outputs. Two images, where one contains all text pixel (surpressing non-text pixel), the other contains all non-text pixel (surpressing the text pixel). So maybe there was a misunderstanding about what the expected output of such a module should and shouldn't be. A better documentation probably would have cleared this up earlier.

bertsky commented 4 years ago

Yes, but as you mentioned yourself previously, as a segmentation processor the output should be regions?

If I said or implied that segmentation can only be done via structural annotation of layout elements, then I was wrong – sorry. PAGE-XML is quite liberal and could be interpreted to allow for AlternativeImage as a preliminary representation of segmentation, too. The actual structural annotation with Coords can then follow in an independent step/processor. The OCR-D call and functional model would allow that IIUC. Compare OCR-D/spec#120. Please consult with @wrznr @cneud @kba to verify or falsify.

In any case, I opened this issue merely to improve documentation (docstrings, tool json description, README), never to suggest abandoning this approach (although that's your decision of course).

So maybe there was a misunderstanding about what the expected output of such a module should and shouldn't be. A better documentation probably would have cleared this up earlier.

I concur. Maybe you can help improving documentation here, as well as give feedback for improvement of OCR-D/spec.

mahmed1995 commented 4 years ago

Updated the documentation in 6390f22

bertsky commented 4 years ago

Updated the documentation in 6390f22

Thanks @mahmed1995. However, I am not at all satisfied with this:

  1. The new statement is not correct. It is only the text+background part that is annotated, not the non-text+background part.
  2. It does not mention the method of segmentation used, which is the whole point of this issue. The word mask does not by itself convey the information that you are using the AlternativeImage mechanism for this.
  3. You don't also document it in the ocrd-tool.json and the processor's docstring, as I have requested.
  4. You still don't add the clipped feature to the (comma-separated) list of @comments.
mjenckel commented 4 years ago
  1. originally we had both alternative images as output. I re-added the second one again in https://github.com/mjenckel/ocrd_anybaseocr/commit/b66825391b40044a1482e34f05ab7f7784b8042a Do you have a recommendation to indicate whether the image is text-clipped or img-clipped?

  2. I specified the output format in the README in https://github.com/mjenckel/ocrd_anybaseocr/commit/c41294d0b8c12a751d730698e00993d173236635 As for the "method of segmentation" I am not 100% sure what you mean.

  3. I updated all the ocrd-tool.json descriptions to indicate output format in https://github.com/mjenckel/ocrd_anybaseocr/commit/f65587d49ea3d001a6d598e14f3d3c3fbe4d1f2a

  4. I also added clipped to the comments in https://github.com/mjenckel/ocrd_anybaseocr/commit/10b14069eb0aaad94996058e4282806ca794091a, but with two output images, there should probably be an additional indication so later processor can filter for the right one? (Or go back to only one image, but then later processors cant use the image-part to further segment/classify it)

bertsky commented 4 years ago

Excellent, thanks @mjenckel!

1. Do you have a recommendation to indicate whether the image is text-clipped or img-clipped?

Not really, no. Your current method of adding the text part as AlternativeImage (marked clipped) but not the non-text part, while differentiating them in the METS via an ID suffix, is okay for the existing spec I guess. But maybe you can make a proposal (preferably as PR on OCR-D/spec, or as discussion issue) yourself. I guess one would at least have to introduce new @comments classes (image features), e.g. something like text-only and nontext-only. I look forward to the discussion!

2\. As for the "method of segmentation" I am not 100% sure what you mean.

Your current formulation is sufficient IMO. (I was again referring to the expectation of getting region types with coordinates, which this binary-only method defeats.)