OCR-D / ocrd_kraken

Wrapper for the kraken OCR engine
Apache License 2.0
11 stars 6 forks source link

ocrd-kraken-segment creates negative coordinates (=invalid PAGE) #34

Open stefanCCS opened 2 years ago

stefanCCS commented 2 years ago

Hi,

I have an example, where ocrd-kraken-segment creates negative coordinates (=invalid PAGE). I just have used:

ocrd resmgr download ocrd-kraken-segment blla.mlmodel
ocrd-kraken-segment -I <inputFileGrp> -O <outputFileGrp>

example.zip As Result I can see:

<pc:TextRegion id="region_line_36">
            <pc:Coords points="3040,382 3040,-2 3219,-2 3219,382 3216,575 3037,569"/>
bertsky commented 1 year ago

It's not behaving this way, anymore. Since #33, we clip all resulting polygons to the Border/canvas.

But unfortunately, in this case, the raw polygons from Kraken yield trouble when dealing with Shapely:

INFO kraken.blla - Vectorizing regions
INFO kraken.blla - Vectorizing baselines
...
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/ocrd_kraken/segment.py", line 85, in process
    res = self.segmenter(page_image)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/ocrd_kraken/segment.py", line 55, in segmenter
    return segment(img, **kwargs)
  File "/data/ocr-d/kraken/kraken/blla.py", line 315, in segment
    topline=net.user_metadata['topline'] if 'topline' in net.user_metadata else False)
  File "/data/ocr-d/kraken/kraken/blla.py", line 210, in vec_lines
    pol = calculate_polygonal_environment(baselines=[bl[1]], im_feats=im_feats, suppl_obj=suppl_obj, topline=topline)
  File "/data/ocr-d/kraken/kraken/lib/segmentation.py", line 710, in calculate_polygonal_environment
    bounds))
  File "/data/ocr-d/kraken/kraken/lib/segmentation.py", line 551, in _extract_patch
    polygon = np.array(roi_polygon.intersection(polygon).boundary.coords, dtype=int)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/shapely/geometry/base.py", line 582, in intersection
    return shapely.intersection(self, other, grid_size=grid_size)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/shapely/decorators.py", line 77, in wrapped
    return func(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv/lib/python3.7/site-packages/shapely/set_operations.py", line 133, in intersection
    return lib.intersection(a, b, **kwargs)
shapely.errors.GEOSException: TopologyException: Input geom 1 is invalid: Self-intersection at 528.85981308411215 126.10280373831776

I guess this is caused by https://github.com/mittagessen/kraken/issues/319 (I have been using shapely 2.0.1 here.)

bertsky commented 1 year ago

It does help to downgrade shapely to 1.8.5.post1, though.