Open chaimleib opened 9 years ago
Concern no. 1: Mixing Points among the Intervals could make it difficult to get just the Points or the Intervals.
Thoughts: Filtering has O(r) time complexity, while querying for a range has O(r*log n) time complexity (r is length of queried range, n is size of IntervalTree). Query and filtering times both scale linearly with r. Query time scales logarithmically with n.
Conclusion: Small gains from optimizing for this concern. Probably not worth optimization.
Concern no. 2: Will the addition of points improve the tree structure or degrade it?
Thoughts: Assume for analysis that Points have a low probability of equalling a Node's x_center.
If so, a new Node will be added in the left or right branch, and there will be minimal chance of increasing the size of s_center. This is close to optimal behavior, since as s_center grows, queries begin to take longer than O(log n), so keeping s_center small is good.
However, as the Nodes become more densely packed, it becomes more likely that a later Interval added will be added to an s_center. The tree will degenerate more quickly to the worst-case scenario of having a few x_center sets containing all the Intervals and Points.
A possible solution involves using a more sophisticated algorithm for deciding the x_center of a Node, and more aggressively rebalancing to avoid growing s_center. This requires further analysis, and is also applicable to the IntervalTree as it exists without Points.
Conclusion: Make a new issue #5 to analyze this optimization.
To ease debugging, I'd like to separate this new functionality into a PointTree object. This will serve as the backend storage for all point objects in the intervaltree. Big-O performance parameters should be no worse than the Intervaltree alone.
Hi @chaimleib
How would you recommend someone should currently deal with 'Null' intervals, while this issue remains open? IMO 'Null' intervals are valid intervals too, and they would be preferable over Point
objects in many situations due to the additional external logic required to distinguish between the two.
I'm tempted to simply patch Interval.is_null()
for now, but I don't know if that might cause any issues because I don't how is_null()
is used other than for validation.
In the past, null intervals caused infinite recursion, which is why I added the verification. You could try using small, non-null intervals, since the numbers don't have to be integers. Will this work for your application?
For my purposes, adding 1
works, so I will do that for now.
@chaimleib, I'd like to second the request of support for point objects in the interval tree. Coming from the GIS world, we have the concept of both point and linear events along a line. For instance, a length of pipe along a pipeline would be treated as a "linear event" (a non-null interval with non-zero length), but a leak location on the pipeline is typically treated as "point event" (a null interval with zero length). Although the GIS requires us to store point and linear event layers separately, when performing interval tree operations (what GIS folks call "dynamic segmentation" operations), it very handy to be able to treat them all together. BTW, @lyschoening's workaround of adding 1 works fine as a work around, but it causes problems at the end of a line, because you have to remember to subtract 1, instead. (At least, if you want to preserve true line length / begin and end interval ranges in the tree.)
Add support for points within the IntervalTree.
These will be implemented as a new Point object.