Open kba opened 5 years ago
@kba Before you fix this: I already did this in cisocrdgroup/cis-ocrd-py/ocrd_cis/ocropy. Please keep in mind there are lots of upcoming changes there, which you very likely want to pull here as well.
Can cisocrdgroup/cis-ocrd-py/ocrd_cis/ocropy wholly replace ocrd_ocropy? I would be open to that. Having the old ocropy codebase duplicated in three places triplicated is nonsensical.
Can cisocrdgroup/cis-ocrd-py/ocrd_cis/ocropy wholly replace ocrd_ocropy?
Yes, it will. In my WIP for ocropy binarization, deskewing and dewarping processors, I already relied on ocropy segmentation and other routines to improve recognition accuracy. (I shrink the textline masks after rotating coordinate polygons to the largest line component in page/region segmentation, so ascenders and descenders from other lines do not interfere anymore – which is otherwise a big problem for ocropy recognition if only rectangles are used, especially after deskewing.) I created a shared module for routines from ocropus-gpageseg
, ocropus-nlbin
and ocropus-rpred
(and some under OLD/
). So I guess this can be merged. I volunteer!
A big problem is ocropus' assumption on image resolution, though. So many heuristics and fixed parameters expect 300 dpi! And we cannot just rescale, because we need pixel-wise accurate coordinate references in PAGE...
See https://github.com/OCR-D/core/issues/245#issuecomment-504015855