Open minump opened 4 months ago
Small boxes (width<5) are usually references. Eg [2]. The scales (canvasWidth/pageWidth) is usually 1, or close to 1 (0.9, 0.87 etc).
Grobid has references (super scripts and subscripts) as part of a sentence, but has a different box/coordinates for the references (superscript / subscript). The separate box is usually very small (<5 width).
Code for small boxes is in the feature-branch. But not integrated yet.
Find small rectangles from grobid coordinates. This is to see if grobid misses / gives wrong coordinates for sentences.