Closed wrznr closed 5 years ago
The help-wanted
tag has been set to get help with an iterator-based implementation for region segmentation.
Still not completely satisfying since Travis fails. This will be handled by merging https://github.com/OCR-D/core/pull/241 which needs review.
This commit proposes to use Tesseract's
SetRectangle
function to restrict the region detection to the area defined by the elementBorder
. After thorough in(tro)spection, it turned out thatGetComponentImages
does not respect the manually defined recognition area when constructing the coordinates of the identified boxes: https://github.com/tesseract-ocr/tesseract/blob/4b397c70cc7d2aef2e50cdb9581b7e10f789ec3d/src/api/baseapi.cpp#L736 Therefore, a manual shift had to be added. This solution is not completely satisfying. Perspectively, Tesseract's own iterators should be employed. Especially when it comes to adding regions of other types than text.Fixes https://github.com/OCR-D/ocrd_tesserocr/issues/32