Fix for https://github.com/allenai/scholar/issues/36624
tool to get spangroups from boxgroups using new is_overlap(center=True) logic to emulate LayoutParser's .filter_by(center=True) which was being used in bib entry detector local testing, and giving better texts than SPP which was using default MMDA _annotate_box_groups.
Basically takes _annotate_box_groups and moves it to tools.py and adds a couple changes (like default to assume we want centers of tokens only, and adds a little x padding to make sure we grab those narrow tokens like "[" which would be missed without the extra padding.
Fix for https://github.com/allenai/scholar/issues/36624 tool to get spangroups from boxgroups using new
is_overlap(center=True)
logic to emulate LayoutParser's .filter_by(center=True) which was being used in bib entry detector local testing, and giving better texts than SPP which was using default MMDA_annotate_box_groups
.Basically takes
_annotate_box_groups
and moves it totools.py
and adds a couple changes (like default to assume we want centers of tokens only, and adds a littlex
padding to make sure we grab those narrow tokens like "[" which would be missed without the extra padding.