Big PR. A lot of it is actually just auto linting/formatting. Here are main changes:
For Box, I added some assertions to make sure problematic Boxes are caught at creation time:
if w < 0 or h < 0:
raise ValueError(f"Width and height must be non-negative, got {w} and {h}")
if page < 0:
raise ValueError(f"Page must be non-negative, got {page}")
if l < 0 or t < 0:
raise ValueError(f"Left and top must be non-negative, got {l} and {t}")
This has been helpful for debugging issues & the library assumes "nice" Boxes.
For Box, I added another assertion for the is_overlap() method:
For Span, I added a merge_boxes: bool = True option to small_spans_to_big_span(). This should preserve the default behavior. But gives us the option to not do that and just create a big Span that has no larger Box. I would prefer that Not-Merging is actually the default behavior, but I think that might break too many things, so maybe in the future.
For Box and Span, I added utility method cluster_boxes() and cluster_spans(). It's based on overlap.
For Box, I also added utility method shrink(). It's not used anywhere now, but it's helpful for debugging.
For BoxGroup and SpanGroup, I've added a new flag in the constructor: allow_overlap: Optional[bool] = False. This should preserve default behavior, which is that it disallows Box or Span within itself to have overlaps. For example, this should catch if a single BoxGroup has duplicate Boxes or a single SpanGroup has duplicate Spans. Otherwise, classes behave as they originally did.
For Indexers, I added a new BoxGroupIndexer class that behaves similarly to SpanGroupIndexer. It's not used, but it was helpful for debugging.
Besides the above library changes, almost everything else I added was a missing test.
Big PR. A lot of it is actually just auto linting/formatting. Here are main changes:
For
Box
, I added some assertions to make sure problematic Boxes are caught at creation time:This has been helpful for debugging issues & the library assumes "nice" Boxes.
For
Box
, I added another assertion for theis_overlap()
method:For
Span
, I added amerge_boxes: bool = True
option tosmall_spans_to_big_span()
. This should preserve the default behavior. But gives us the option to not do that and just create a bigSpan
that has no largerBox
. I would prefer that Not-Merging is actually the default behavior, but I think that might break too many things, so maybe in the future.For
Box
andSpan
, I added utility methodcluster_boxes()
andcluster_spans()
. It's based on overlap.For
Box
, I also added utility methodshrink()
. It's not used anywhere now, but it's helpful for debugging.For
BoxGroup
andSpanGroup
, I've added a new flag in the constructor:allow_overlap: Optional[bool] = False
. This should preserve default behavior, which is that it disallowsBox
orSpan
within itself to have overlaps. For example, this should catch if a singleBoxGroup
has duplicateBoxes
or a singleSpanGroup
has duplicateSpans
. Otherwise, classes behave as they originally did.For
Indexers
, I added a newBoxGroupIndexer
class that behaves similarly toSpanGroupIndexer
. It's not used, but it was helpful for debugging.Besides the above library changes, almost everything else I added was a missing test.