OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

block segmentation: overlaps and quality of prebuilt models #82

Open bertsky opened 3 years ago

bertsky commented 3 years ago

Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.

Here's how I gradually worked to isolate the problem.

a b
FILE_0001_REGIONS-ANYOCR_bbox-best_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-best_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_bbox-all_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-all_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_mask-best_pageviewer FILE_0002_REGIONS-ANYOCR_mask-best_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_mask-all_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_mask-all-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-nms_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active_pageviewer
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active-nms_pageviewer

So all these refinements seem crucial.

But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.

Hence, inevitably, we need to retrain this.

@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?