allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Egork/merge spans #139

Closed egork520 closed 2 years ago

egork520 commented 2 years ago

Adding a class to the utils for merging spans with optional parameter of x, y. x, y are used as a distance added to the boundaries of the boxes to decided if they overlap.

Here is an example of tokens which are represented by list of spans, the task is to merge them into a single span and box image

Result of merging tokens with x=0.04387334, y=0.01421097 being average size of the token in the document image

Another example with same x=0.04387334, y=0.01421097

image image

kyleclo commented 2 years ago

thanks @comorado , this looks similar to this function that also exists in that same utils/tools.py: https://github.com/allenai/mmda/blob/ceead0fcd65f970c40ff05775322add52ce33686/mmda/utils/tools.py#L6

here's the test: https://github.com/allenai/mmda/blob/ceead0fcd65f970c40ff05775322add52ce33686/tests/test_utils/test_tools.py#L16

can you explain what the differences are?

egork520 commented 2 years ago

It does looks the same was not aware of this function, except the differences in the implementation.

Here are the differences:

In the PR above

I need this functionality so that I can find boxes corresponding to the figure caption and merge figures which do have an overlap

kyleclo commented 2 years ago

thanks @comorado , do u think it'd be possible to consolidate these two methods to a single one, since they're very highly similar? it sounds like:

egork520 commented 2 years ago

thanks @comorado , do u think it'd be possible to consolidate these two methods to a single one, since they're very highly similar? it sounds like:

  • your approach which produces a single "union" box is better than the previous method
  • your approach that doesn't assume non-overlap is better than the previous method
  • both approaches have valid ways one might want to 'merge' (i.e. distance in symbol vs distance in box coordinate). maybe they should be 2 different names: merge_neighbor_spans_by_symbol_distance() and merge_neighbor_spans_by_box_coordinate()

I could combine both in a single implementation. I like two distinct names. Will work on updated int the PR