chaimleib / intervaltree

A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range containment.
Apache License 2.0
636 stars 108 forks source link

Inefficiency In Overlaps Call #127

Open jgaupp-accuragen opened 1 year ago

jgaupp-accuragen commented 1 year ago

Overlaps performs much poorer than Overlap when running a tree with many small intervals.

The performance problem was found when calling Overlaps against a tree of ~450,000 intervals typically around 400 units wide. The problem was not obvious when calling Overlaps against a tree of ~5,000 intervals typically around 100,000 units wide.

When calling overlaps against the tree of 450K intervals, a large set of queries completed in ~15 minutes, while overlap took ~12 seconds.

cProfile identified the time spent at -

731923 1414.870 0.002 1414.905 0.002 /<venv path>/python3.10/site-packages/intervaltree/intervaltree.py:616(<genexpr>)

where column 2 is total time (1414.870) and the the operation referenced at intervaltree.py:616 is -

return any(
    self.overlaps_point(bound)
    for bound in self.boundary_table
    if begin < bound < end
) 
jgaupp-accuragen commented 1 year ago

And thanks for your effort in creating this library.

It's provided an excellent balance of size and performance, when dictionaries proved prohibitive by memory use and linear interval searches were unwieldy or impossible to implement.