Closed aelzeiny closed 4 years ago
Since these changes were not merged into intervaltree
, may I suggest you to have a look at portion
(see https://github.com/AlexandreDecan/portion)? It's a Python library I made (formerly known as python-intervals
) that supports these operations and that implements some of the features of intervaltree
(look at our IntervalDict
implementation). Notice however this implementation is currently based on lists and not on a tree, so performance are expected to be worse (there are plans to implement these as a tree).
I'm hesitant to add these because there isn't a well-defined way to process the intervals' data field.
I agree with you @chaimleib , that's why I suggested @aelzeiny to have a look at portion
that already provides these operations ;-)
Hey @AlexandreDecan, I appreciate the suggestion for your library, but an interval tree was necessary to achieve performance in the project that I was working on. I'm not joking when I say that we took an O(n^3log*n) algo that ran on production for 10 hours and cut it down to 5 minutes by monkey-patching this library
@chaimleib Fair point. We could remove the operators (which are syntactic sugar) and just put plain-old methods that take in an optional data field. Leave it up to the user.
I see this is the pattern that you often follow with the IntervalTree class.
Hi @aelzeiny ! Most operations implemented in portion
are very efficient. For atomic intervals (i.e. intervals composed of a single continuous interval), they are in constant time. For intervals (i.e., disjunction of atomic intervals), they are mostly bounded by a O(n log n) since I rely on a sorted list of intervals. Worse case scenario is the same than with an interval tree, while I have to agree that, on average, an interval tree can be slightly faster for some operations. However, our IntervalDict
class (that allows to store extra data along with intervals, and that has a combine
method that accepts an arbitrary callable to combine values) relies on a plain list (so it adds up O(n), where n is the number of distinct values (not keys) that are stored). A short term plan is to improve this class by using an interval tree but this is still a work in progress, and it's not that easy since our first-class citizens are disjunctions of intervals, and not atomic intervals.
[Edited: Also, I wanted to say I'm sorry, that's not the right place to discuss about such details of portion
. If you're interested to pursue this discussion, can I ask you to open an issue here ?]
Closing, will not do.
3 interval functions that you find in the
set()
builtin class. Intersection, difference, and union.I would rather show than tell.
In code that means.
In the future, intervaltree can have better functionality in terms of slicing, intersecting, and unioning. I thought that's what the intersection function did, but after looking at the implementation I misunderstood. This PR is the first step that would help get there.
For example, here are some things we can add in the future after this PR is merged.
chop(other_iv)
in a loop]