Open jgaupp-accuragen opened 1 year ago
And thanks for your effort in creating this library.
It's provided an excellent balance of size and performance, when dictionaries proved prohibitive by memory use and linear interval searches were unwieldy or impossible to implement.
Overlaps performs much poorer than Overlap when running a tree with many small intervals.
The performance problem was found when calling Overlaps against a tree of ~450,000 intervals typically around 400 units wide. The problem was not obvious when calling Overlaps against a tree of ~5,000 intervals typically around 100,000 units wide.
When calling overlaps against the tree of 450K intervals, a large set of queries completed in ~15 minutes, while overlap took ~12 seconds.
cProfile identified the time spent at -
731923 1414.870 0.002 1414.905 0.002 /<venv path>/python3.10/site-packages/intervaltree/intervaltree.py:616(<genexpr>)
where column 2 is total time (1414.870) and the the operation referenced at intervaltree.py:616 is -