OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
47 stars 12 forks source link

Using other than the last AlternativeImage #19

Closed mjenckel closed 4 years ago

mjenckel commented 4 years ago

In our BlockSegmentation code we use raw input rather than the processed AlternativeImages. We achieved that by using the "feature_filter" and filtering for all other processing steps. This works quite nicely, however after adding the resulting text_regions to the page file like this:

        <pc:TextRegion type="paragraph">
            <pc:AlternativeImage filename="OCR-D-IMG-BLOCK-SEGMENT/OCR-D-IMG-BLOCK-SEGMENT_0001_0.png" comments=",blksegmented"/>
            <pc:Coords points="277,0 989,0 989,2022 277,2022"/>
        </pc:TextRegion>

we cant use them in any of the following processes. We get the following error:

16:42:50.204 WARNING ocrd_utils - crop coordinates ((1953, -222, 2454, 1690)) exceed image (1845x2324)

The problem seems to be, that any future process assumes the coordinates added by BlockSegmentation should be transformed according to any previous process (e.g. cropping), even though the comments do not mention any previous computation for this region. @kba Is it also correct, that even though the mentioned AlternativeImages exist as files in the workspace, the processor prefers to calculate the regions from the image?

For now we will change it so BlockSegmentation uses the latest AlternativeImage rather than the raw image as input.

wrznr commented 4 years ago

@bertsky I bet you know a way out of this, right?

bertsky commented 4 years ago

We achieved that by using the "feature_filter" and filtering for all other processing steps.

Where? I cannot see this on your current master or any other branch. It's important to see how exactly you were trying to do it.

filtering for all other processing steps why would you do that? Are you sure you want to rule out cropped, deskewed, dewarped etc? For a NN model trained on raw images, I think it's enough to use feature_filter='binarized,grayscale_normalized'. As an extra, if your net is capable of utilising alpha channels, use transparency=True as well.

comments=",blksegmented"

This feature name violates the spec. And more importantly, there is no such thing as a block-segmented derived image in the OCR-D processing model. You must add your segment coordinates and classes to the PAGE-XML, not the cropped image. (Cropped images are merely allowed as an extra, consistent with the coordinates. But since you say you already filter binarized/normalized images on the input side, I would strongly recommend against that. In the least, the output images should be cropped from the fully-featured latest input images. Otherwise you likely break workflow / user expectations.)

The problem seems to be, that any future process assumes the coordinates added by BlockSegmentation should be transformed according to any previous process (e.g. cropping), even though the comments do not mention any previous computation for this region.

Any consumer of PAGE-XML must assume that a derived image (AlternativeImage) for a segment (page / region / line / word / glyph) depicts the minimal bounding rectangle w.r.t. the original image for that segment's coordinate polygon after upstream @orientation has been applied. This is the coordinate consistency principle. It follows naturally from the spec (although it has not been spelled out enough yet). Comments/features must always represent exactly the operations that have already been applied to the segment. (Otherwise you can end up with problems like double rotation or double cropping.)

Some of those features are monotonic across levels, whereas others like deskewed, rotated-90, dewarped are level-local. See here for an explanation why the latter is unavoidable.

Is it also correct, that even though the mentioned AlternativeImages exist as files in the workspace, the processor prefers to calculate the regions from the image?

No, the 2 image API functions in core will always try to give the last derived images which satisfies all required features (positive and negative). If some operations are missing which itself can apply, then it does so.

mjenckel commented 4 years ago

Where? I cannot see this on your current master or any other branch. It's important to see how exactly you were trying to do it.

We tried it locally in the way you described as well. Since it didnt work we didnt push it to master. We actually had to use feature_filter='binarized,cropped,deskewed'. If we only used binarized we would get outputs with comment='cropped,deskewed' rather than the raw image. When possible it seems to apply all stored operations.

comments=",blksegmented"

This feature name violates the spec.

Noted and will be fixed next commit.

minimal bounding rectangle w.r.t. the original image

While, as mentioned in #15 we didnt transform coordinates properly in all cases, this was not the problem in this case. What we did to create this problem is the following:

  1. We applied Binarization, Deskewing, Cropping and tiseg (no Dewarping to stay consistent) to an image and saved the respective AlternativeImages / PAGE-XML outputs. In the case of Cropping these coordinates are in the page-level coordinate system.

  2. We applied Block-Segmentation, but we used feature-filter so it uses the raw input image. The coordinates we store are therefore also on the page-level without taking cropping or deskewing into account. This is as you mentioned in #15 the proper way to store the results.

  3. We tried to apply TextlineExtraction on the output of Block-Segmentation. The Error was a message from the OCR-D that mentioned cropping has failed because the coordinates are out of range of the cropped image. WARNING ocrd_utils - crop coordinates ((777, -76, 1325, 1670)) exceed image (1845x2324) This seems to happen because the detected TextRegions are slightly outside the crop area For comparison the detected text region with coordinates:

    <pc:TextRegion id="OCR-D-PAGE-TISEG_0001_region0000" type="paragraph">
        <pc:AlternativeImage filename="OCR-D-IMG-BLOCK-SEGMENT/OCR-D-IMG-BLOCK-SEGMENT_0001_0.png" comments=""/>
        <pc:Coords points="212,86 961,86 961,1978 212,1978"/>
    </pc:TextRegion>

    and the detected border:

    <pc:Border>
        <pc:Coords points="75,231 1920,223 1930,2547 85,2555"/>
    </pc:Border>

Would it be possible to change the behavior of segment_from_image to "apply" cropping to these regions not just by shifting their coordinates, but also crop the TextRegions to the BorderRegion if they are outside? Maybe alongside a warning, that TextRegions are being shrunk to match the BorderRegion. Idealy the TextRegions ofcourse wouldnt ever exceed the BorderRegion, but this might happen if the Cropping happens to cut of some text or in our case, where we encourage the TextRegion to rather be too large rather than too small (and cut off text).

For now we will change it so BlockSegmentation uses the latest AlternativeImage rather than the raw image as input.

We actually tried this but the performance loss of our BlockSegmentation is to big if used after Binarization or Cropping to be a viable option. This means we also cant use it after Dewarping, since our Dewarping requires binarized images.

bertsky commented 4 years ago

Where? I cannot see this on your current master or any other branch. It's important to see how exactly you were trying to do it.

We tried it locally in the way you described as well. Since it didnt work we didnt push it to master.

That's just as well. But you should push it to some feature branch or PR (for which you can request my review) publicly so that we can talk about actual things, not feats of my imagination.

We actually had to use feature_filter='binarized,cropped,deskewed'. If we only used binarized we would get outputs with comment='cropped,deskewed' rather than the raw image. When possible it seems to apply all stored operations.

You said so earlier, but not why. I explained how to use coordinates_for_segment in #15 – that should make the difference.

1. We applied Binarization, Deskewing, Cropping and tiseg (no Dewarping to stay consistent) to an image and saved the respective AlternativeImages / PAGE-XML outputs. 

That sounds like you were also using these images in the next-respective consumer (i.e. binarized image for deskewing, binarized+deskewed image for cropping, binarized+deskewed+cropped image for segmentation). But is that really the case?

In the case of Cropping these coordinates are in the page-level coordinate system.

I don't know how I should read this. PAGE coordinates are always absolute. (Please don't use page-level to refer to absolute, because cropped images can also be on the page level, but their coordinate system is not absolute anymore.)

2\. We applied Block-Segmentation, but we used `feature-filter` so it uses the raw input image. The coordinates we store are therefore also on the page-level without taking cropping or deskewing into account. This is as you mentioned in #15 the proper way to store the results.

Okay, so at least for segmentation you were ignoring all the derived images produced so far. You are correct: no coordinate conversion is necessary in that case. (But segmentation cannot benefit from deskewing and cropping either. Maybe if you trained your net with augmented skewing, and with uncropped, maybe even crop-masked images, then that would not hurt. But did you?)

3\. We tried to apply TextlineExtraction on the output of Block-Segmentation. The Error was a message from the OCR-D that mentioned cropping has failed because the coordinates are out of range of the cropped image.
    `WARNING ocrd_utils - crop coordinates ((777, -76, 1325, 1670)) exceed image (1845x2324)`

You didn't say at what level this happened and what filters/selectors you were using. I assume you mean you did some image_from_page() without filtering (which gave you the binarized, deskewed and cropped page image) and then passed that on to image_from_segment() without filtering on the region level (which should eventually try to use OCR-D-IMG-BLOCK-SEGMENT_0001_0.png here, but first tries to crop a region image itself to be on the safe side).

    This seems to happen because the detected TextRegions are slightly outside the crop area
    For comparison the detected text region with coordinates:
<pc:TextRegion id="OCR-D-PAGE-TISEG_0001_region0000" type="paragraph">
        <pc:AlternativeImage filename="OCR-D-IMG-BLOCK-SEGMENT/OCR-D-IMG-BLOCK-SEGMENT_0001_0.png" comments=""/>
        <pc:Coords points="212,86 961,86 961,1978 212,1978"/>
</pc:TextRegion>

and the detected border:

<pc:Border>
        <pc:Coords points="75,231 1920,223 1930,2547 85,2555"/>
</pc:Border>

Yes, this could explain the problem. The figures are not exactly the same – but you have a skew angle on the page as well, right? (So in detail what would happen is that image_from_page crops according to Border and then applies @orientation. That image along with its coordinate transform gets passed to image_from_segment. This transforms the region coordinates to relative and tries to crop that bbox from the page image.)

But without the actual code or full example, this is all guesswork.

Would it be possible to change the behavior of segment_from_image to "apply" cropping to these regions not just by shifting their coordinates, but also crop the TextRegions to the BorderRegion if they are outside? Maybe alongside a warning, that TextRegions are being shrunk to match the BorderRegion.

No, that would lead to downstream inconsistency again. Who can still tell that the derived image now annotated does not represent the bbox of the region's polygon, but of a modified polygon? Core might do the same trick each time when calculating the coordinate transform for that image, but other implementations might not be aware of it. What core does here is correct: it gives you the requested image, but everything outside Border will be blank.

Note that no element in PAGE is allowed to exceed its parent element. You must fix this in/after the producer of the coordinates (your segmentation). And make sure the region image also matches the fixed polygon!

Maybe alongside a warning, that TextRegions are being shrunk to match the BorderRegion. Idealy the TextRegions ofcourse wouldnt ever exceed the BorderRegion, but this might happen if the Cropping happens to cut of some text or in our case, where we encourage the TextRegion to rather be too large rather than too small (and cut off text).

If you don't want to do it in your segmentation processor, then you could use ocrd-segment-repair with the sanitize option. But again: your image will have to be updated as well, so it's better to do that in the segmentation processor right away.

For now we will change it so BlockSegmentation uses the latest AlternativeImage rather than the raw image as input.

We actually tried this but the performance loss of our BlockSegmentation is to big if used after Binarization or Cropping to be a viable option. This means we also cant use it after Dewarping, since our Dewarping requires binarized images.

What do you mean performance, computation time or quality?

If the former: Why do you attribute the time requried for binarization and cropping to the segmentation processor? These are independent steps, they each have their own "quota". Where to invest resources for improved quality is, after all, up to the workflow designer to decide.

mjenckel commented 4 years ago

@bertsky I created a branch "region_cropping_problem". In that branch I also added a "pipeline.py" with a test page that should allow you to reproduce the error. TISEG is in the pipeline but doesnt actually do anything since there are no images on the test page. I added to cropping the transformation of coordinates to absolute coordinates, so that should not be a problem. I also added a method that reduces any TextRegions to the BorderRegion although, since we computed the TextRegions on the raw image, we didnt consider the BorderRegion as the parent. I think we probably should anyways (if it exists)?

What do you mean performance, computation time or quality?

I was talking about the quality.

P.S.: This branch doesnt use the latest version of our BlockSegmentation. I just use it to reliably reproduce the problem, so I can fix it.

wrznr commented 4 years ago

@mjenckel Were you able to fix the initial problem?

mahmed1995 commented 4 years ago

Fixed.