allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Improve merging logic for overlapping figure boundaries from layoutparser #165

Closed egork520 closed 1 year ago

egork520 commented 1 year ago

Image An example where layoutparser identifies multiple boxes as a figures.

Currently logic is to to merge boxes into one big one. Proposal is to check nearby figure captions and make a decision on merge

Image

More complicated examplel where layoutparser finds too many bounding boxes

Where the result is

Image

egork520 commented 1 year ago

Looking through some of the examples I realized that perhaps merging all the overlapping figure boxes by default might not be the right approach.

Vila figure caption boxes prediction seems to be more robust than figure boxes. One way to use this is to try to merge overlapping figure boxes and if the count of the boxes is less than the number of captions abort merging of the boxes.

The above approach increased number of the detected figures for the annotated set of papers.

comparison_table - comparison_table (1).csv

With the refined approach I get better numbers (more figures detected) Screen Shot 2022-10-19 at 1 10 08 PM