Improve merging logic for overlapping figure boundaries from layoutparser

allenai / mmda

multimodal document analysis

Apache License 2.0

158 stars 18 forks source link

Looking through some of the examples I realized that perhaps merging all the overlapping figure boxes by default might not be the right approach.

Vila figure caption boxes prediction seems to be more robust than figure boxes. One way to use this is to try to merge overlapping figure boxes and if the count of the boxes is less than the number of captions abort merging of the boxes.

The above approach increased number of the detected figures for the annotated set of papers.

comparison_table - comparison_table (1).csv

With the refined approach I get better numbers (more figures detected) Screen Shot 2022-10-19 at 1 10 08 PM

allenai / mmda

Improve merging logic for overlapping figure boundaries from layoutparser #165