chaimleib / intervaltree

A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range containment.
Apache License 2.0
636 stars 108 forks source link

Add intersection, union, and difference methods #97

Closed aelzeiny closed 4 years ago

aelzeiny commented 4 years ago

3 interval functions that you find in the set() builtin class. Intersection, difference, and union.

I would rather show than tell.

A:   ==========
B:      =====
A&B:    =====
A|B: ==========
A-B: ===     ==
B-A:

In code that means.

a = Interval(0, 10)
b = Interval (3, 7)
assert a & b == a.intersection(b) == Interval(3, 7)
assert a | b == a.union(b) == Interval(0, 10)
assert a - b == a.difference(b) == [Interval(0, 3), nterval(7, 10)]
assert b - a == b.difference(a) == []

In the future, intervaltree can have better functionality in terms of slicing, intersecting, and unioning. I thought that's what the intersection function did, but after looking at the implementation I misunderstood. This PR is the first step that would help get there.

For example, here are some things we can add in the future after this PR is merged.

AlexandreDecan commented 4 years ago

Since these changes were not merged into intervaltree, may I suggest you to have a look at portion (see https://github.com/AlexandreDecan/portion)? It's a Python library I made (formerly known as python-intervals) that supports these operations and that implements some of the features of intervaltree (look at our IntervalDict implementation). Notice however this implementation is currently based on lists and not on a tree, so performance are expected to be worse (there are plans to implement these as a tree).

chaimleib commented 4 years ago

I'm hesitant to add these because there isn't a well-defined way to process the intervals' data field.

AlexandreDecan commented 4 years ago

I agree with you @chaimleib , that's why I suggested @aelzeiny to have a look at portion that already provides these operations ;-)

aelzeiny commented 4 years ago

Hey @AlexandreDecan, I appreciate the suggestion for your library, but an interval tree was necessary to achieve performance in the project that I was working on. I'm not joking when I say that we took an O(n^3log*n) algo that ran on production for 10 hours and cut it down to 5 minutes by monkey-patching this library

aelzeiny commented 4 years ago

@chaimleib Fair point. We could remove the operators (which are syntactic sugar) and just put plain-old methods that take in an optional data field. Leave it up to the user.

I see this is the pattern that you often follow with the IntervalTree class.

AlexandreDecan commented 4 years ago

Hi @aelzeiny ! Most operations implemented in portion are very efficient. For atomic intervals (i.e. intervals composed of a single continuous interval), they are in constant time. For intervals (i.e., disjunction of atomic intervals), they are mostly bounded by a O(n log n) since I rely on a sorted list of intervals. Worse case scenario is the same than with an interval tree, while I have to agree that, on average, an interval tree can be slightly faster for some operations. However, our IntervalDict class (that allows to store extra data along with intervals, and that has a combine method that accepts an arbitrary callable to combine values) relies on a plain list (so it adds up O(n), where n is the number of distinct values (not keys) that are stored). A short term plan is to improve this class by using an interval tree but this is still a work in progress, and it's not that easy since our first-class citizens are disjunctions of intervals, and not atomic intervals.

[Edited: Also, I wanted to say I'm sorry, that's not the right place to discuss about such details of portion. If you're interested to pursue this discussion, can I ask you to open an issue here ?]

chaimleib commented 4 years ago

Closing, will not do.