allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Egork/figure caption v2 #170

Closed egork520 closed 1 year ago

egork520 commented 1 year ago

Vila figure/table caption prediction is more robust than layoutparser figure boundaries prediction. Thus instead of merging all the overlapping figure boxes by default. I merge the figure boxes and then check if number of remaining boxes on the page is less than number of figure/table captions. If it is less I abort merging of overlapping figure boxes.