Egork/figure caption v2

Vila figure/table caption prediction is more robust than layoutparser figure boundaries prediction. Thus instead of merging all the overlapping figure boxes by default. I merge the figure boxes and then check if number of remaining boxes on the page is less than number of figure/table captions. If it is less I abort merging of overlapping figure boxes.

allenai / mmda

Egork/figure caption v2 #170