chaimleib / intervaltree

A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range containment.
Apache License 2.0
636 stars 108 forks source link

Add support for points #4

Open chaimleib opened 9 years ago

chaimleib commented 9 years ago

Add support for points within the IntervalTree.

These will be implemented as a new Point object.

chaimleib commented 9 years ago

Concern no. 1: Mixing Points among the Intervals could make it difficult to get just the Points or the Intervals.

Thoughts: Filtering has O(r) time complexity, while querying for a range has O(r*log n) time complexity (r is length of queried range, n is size of IntervalTree). Query and filtering times both scale linearly with r. Query time scales logarithmically with n.

Conclusion: Small gains from optimizing for this concern. Probably not worth optimization.

chaimleib commented 9 years ago

Concern no. 2: Will the addition of points improve the tree structure or degrade it?

Thoughts: Assume for analysis that Points have a low probability of equalling a Node's x_center.

If so, a new Node will be added in the left or right branch, and there will be minimal chance of increasing the size of s_center. This is close to optimal behavior, since as s_center grows, queries begin to take longer than O(log n), so keeping s_center small is good.

However, as the Nodes become more densely packed, it becomes more likely that a later Interval added will be added to an s_center. The tree will degenerate more quickly to the worst-case scenario of having a few x_center sets containing all the Intervals and Points.

A possible solution involves using a more sophisticated algorithm for deciding the x_center of a Node, and more aggressively rebalancing to avoid growing s_center. This requires further analysis, and is also applicable to the IntervalTree as it exists without Points.

Conclusion: Make a new issue #5 to analyze this optimization.

chaimleib commented 9 years ago

To ease debugging, I'd like to separate this new functionality into a PointTree object. This will serve as the backend storage for all point objects in the intervaltree. Big-O performance parameters should be no worse than the Intervaltree alone.

lyschoening commented 9 years ago

Hi @chaimleib

How would you recommend someone should currently deal with 'Null' intervals, while this issue remains open? IMO 'Null' intervals are valid intervals too, and they would be preferable over Point objects in many situations due to the additional external logic required to distinguish between the two.

I'm tempted to simply patch Interval.is_null() for now, but I don't know if that might cause any issues because I don't how is_null() is used other than for validation.

chaimleib commented 9 years ago

In the past, null intervals caused infinite recursion, which is why I added the verification. You could try using small, non-null intervals, since the numbers don't have to be integers. Will this work for your application?

lyschoening commented 9 years ago

For my purposes, adding 1 works, so I will do that for now.

tthorleifson commented 1 year ago

@chaimleib, I'd like to second the request of support for point objects in the interval tree. Coming from the GIS world, we have the concept of both point and linear events along a line. For instance, a length of pipe along a pipeline would be treated as a "linear event" (a non-null interval with non-zero length), but a leak location on the pipeline is typically treated as "point event" (a null interval with zero length). Although the GIS requires us to store point and linear event layers separately, when performing interval tree operations (what GIS folks call "dynamic segmentation" operations), it very handy to be able to treat them all together. BTW, @lyschoening's workaround of adding 1 works fine as a work around, but it causes problems at the end of a line, because you have to remember to subtract 1, instead. (At least, if you want to preserve true line length / begin and end interval ranges in the tree.)