allenai / mmda

multimodal document analysis
Apache License 2.0
159 stars 18 forks source link

Merge Spans per page when merging spans by symbol distance #210

Closed geli-gel closed 1 year ago

geli-gel commented 1 year ago

group Spans by pages when merging SpanGroup Spans by symbol distance to avoid "boxes are on different pages" error

This error is coming up when annotating Grobid BoxGroups onto a Document because some BoxGroups contain boxes across pages with spans that might be close together in symbol distance.

related to allenai/scholar/issues/35751

geli-gel commented 1 year ago

@kyleclo also passed tt verify for bibentry_detection_predictor, ivila-row-layoutlm-finetuned-s2vl-v2, bibentry_predictor_mmda, and figure_table_predictors.

kyleclo commented 1 year ago

@geli-gel can u also increment the version in pyproject