IRC-SPHERE / HyperStream

HyperStream
https://irc-sphere.github.io/HyperStream/
MIT License
13 stars 5 forks source link

time_interval split recursion limit #29

Closed perellonieto closed 6 years ago

perellonieto commented 6 years ago

The function split from the class TimeIntervals raises an exception when passing a list of about 990 points.

    @profile
    def split(self, points):
        if len(points) == 0:
            return
        p = points[-1]
        for i in range(len(self.intervals)):
            if (self.intervals[i].start < p) and (self.intervals[i].end > p):
                self.intervals = self.intervals[:i] \
                                 + [TimeInterval(self.intervals[i].start, p), TimeInterval(p, self.intervals[i].end)] \
                                 + self.intervals[(i + 1):]
        self.split(points[:-1])

This is because there is a limit in the recursion in Python to avoid possible overflows.

We should change the recursive call to a for loop.

Before I do the modification, do you see any reason why it was a recursive call from the first instance that I should consider?

perellonieto commented 6 years ago

I tried in my local machine the following code and it seems to work. However, still need to do some tests as do not understand the rest of the code yet.

    @profile
    def split(self, points):
        for p in points:
                for i in range(len(self.intervals)):
                    if (self.intervals[i].start < p) and (self.intervals[i].end > p):
                        self.intervals = (self.intervals[:i]
                                         + [TimeInterval(self.intervals[i].start, p),
                                            TimeInterval(p, self.intervals[i].end)]
                                         + self.intervals[(i + 1):])
perellonieto commented 6 years ago

I have been looking at the function and there is something that intrigues me. Is it a true condition that the original intervals are non-overlaping? meaning that any given point can only be in one of the intervals?? If that is true, then the for loop do not need to go trough all the intervals once it has found the one in which it falls, and you could break. If that is not true there is a possible error as the intervals are getting longer but the original indices only iterate for the original intervals length. Every time that you split an interval you may miss one interval at the end.

tdiethe commented 6 years ago

Yes the original intervals must be non-overlapping, otherwise they would have been merged. So yes you are correct that a point can only be one. Your for loop should be fine. I hadn't considered that the recursion depth wasn't actually a bug but python being overprotective!

perellonieto commented 6 years ago

That is great! I will do a pull request (or push the change if I have permission).

perellonieto commented 6 years ago

Commit 7c4747a324d2a53dd33b291b72464d7476fb8d06 fixes #29. I added as a comment that the time intervals need to be non-overlapping. In that way, I break the internal for loop when the right interval is found per each point.