dengwirda / inpoly-python

A fast 'point(s)-in-polygon' test for Python.
Other
25 stars 5 forks source link

There's an couple asymptotically faster methods. #14

Closed tatarize closed 1 year ago

tatarize commented 1 year ago

If you use a little bit of memory you can store the scanbeam table. At every position y, you will have some number of active edges(k), you sort the y-values into a scanbeam with events to add to the active-edge-list when you reach an start-y event and remove that value when you hit an end-y event. So you have strips of y ranges between events that are associated with the active segments in that y-range.

scanbeam

You can check these y-values with a binary search to find at that y value a static already complete list of active edges. In a test with m edges and n This means that your time complexity is O(2*log(m)*m) for the sort. O(2m) for the scanbeam table generation, and O(log(m) * k) for the PIP tests. Our PIP tests are the biggest piece of this since there n of them. Which gives us O(n * log(m) * k).


Further we can not only calculate the scanbeam but we could also find the intersections with bentley-ottmann and break the lines at the intersections (rather than keeping the look-ahead intersection events in the event-heap) and then when we calculate the scanbeam table, (with events now at the self-intersections of the polygon too) we can guarantee that if we sorted them by their x-values there would be no point in that scanbeam for which any segment be in a non-sorted order. Though taking special care to make sure segments that are above other segments are placed above them in the sorted order even if their start (or end) points are coincident (sorting by the abscissa(x) of the midpoints of segments within the scanbeam could do this).

This adds a lot more to our setup. We not only have the m events but also i intersections of the polygon. But, m+i is very likely to be roughly equal to m. This means our setup is going to take O(m*log m) for the sort O((m+i)log m) time to find the intersections. We can actually sort the k active-edges for free because they only swap at the intersections which mean we can swap them there and keep them sorted (but it's probably much easier to just sort the things since setup should be dwarfed by the PIP tests). -- And then because our active edges are sorted we can now also binary search into them, our PIP tests will take O(log(m+i) log(k))And allnof them will takeO(n log(m+i) * log(k)). However, since our averagek` is going to be like 2 maybe 4 this is whole thing is probably not that worth it). The first one is though.

tatarize commented 1 year ago
from bisect import bisect

def build_edge_list(polygon):
    edge_list = []
    for i in range(1, len(polygon)):
        if (polygon[i].imag, polygon[i].real) < (polygon[i - 1].imag, polygon[i - 1].real):
            edge_list.append((polygon[i], i))
            edge_list.append((polygon[i - 1], ~i))
        else:
            edge_list.append((polygon[i], ~i))
            edge_list.append((polygon[i - 1], i))

    def sort_key(e):
        return e[0].imag, e[0].real, ~e[1]

    edge_list.sort(key=sort_key)
    return edge_list

def build_scanbeam(edge_list):
    actives_table = []
    events = []
    y = -float("inf")
    actives = []
    for pt, index in edge_list:
        if y != pt.imag:
            actives_table.append(list(actives))
            events.append(y)
        if index >= 0:
            actives.append(index)
        else:
            actives.remove(~index)
        y = pt.imag
    return actives_table, events

def point_in_polygon(polygon, point):
    def x_intercept(e, y):
        pt0 = polygon[e-1]
        pt1 = polygon[e]
        if pt1.real - pt0.real == 0:
            return pt0.real
        m = (pt1.imag - pt0.imag) / (pt1.real - pt0.real)
        b = pt0.imag - (m * pt0.real)
        return (y - b) / m

    edge_list = build_edge_list(polygon)
    actives_table, events = build_scanbeam(edge_list)
    try:
        if len(point):
            return [point_in_scantable(actives_table, events, x_intercept, p) for p in point]
    except TypeError:
        return point_in_scantable(actives_table, events, x_intercept, point)

def point_in_scantable(actives_table, events, xi, point):
    beam = bisect(events, point.imag) - 1  # Binary search in sorted array.
    actives_at_y = actives_table[beam]
    total = sum([point.real > xi(e, point.imag) for e in actives_at_y])
    return bool(total % 2)

It's not written in C so it's decidedly slower, but it starts to close the gap at numbers greater than about 50000 in the tests I ran on it.

tatarize commented 1 year ago
500: scanbeam 3.111207962036133, inpoly 1.078752040863037 -- 0.3467309334593779
5000: scanbeam 3.816012382507324, inpoly 1.0874779224395752 -- 0.28497756648395467
50000: scanbeam 10.510129928588867, inpoly 1.752586841583252 -- 0.1667521575367015
500000: scanbeam 78.90599799156189, inpoly 10.414604425430298 -- 0.13198748752336956
5000000: scanbeam 758.0127830505371, inpoly 116.62096095085144 -- 0.15385091591928504
50000000: scanbeam 7535.422480583191, inpoly 1766.688975572586 -- 0.23445121758267445

With an m set to 50000. This is running scanbeam 10 times and 100 for inpoly. So the ratio is actually a lot less, but you can see from the sample sizes that it's actually rebounding there. It might take a few trillion before it out matches the C++.

    def test_inpoly_vs_pip(self):
        for exp in range(2, 8):
            N = 5 * 10 ** exp

            lenpoly = 50000
            polygon = [
                [np.sin(x) + 0.5, np.cos(x) + 0.5]
                for x in np.linspace(0, 2 * np.pi, lenpoly)
            ]
            polygon = np.array(polygon, dtype="float32")

            points = np.random.uniform(-1.5, 1.5, size=(N, 2)).astype("float32")

            t = time.time()
            for i in range(100):
                IN, ON = inpoly2(points, polygon)
            time_ip = time.time() - t
            points = points[:, 0] + points[:, 1] * 1j
            pg = polygon[:, 0] + polygon[:, 1] * 1j
            t = time.time()
            for i in range(10):
                pip = point_in_polygon(pg, points)
            time_sb = time.time() - t
            print(f"{N}: scanbeam {time_sb}, inpoly {time_ip} -- {time_ip / time_sb}")

Also, it's a big circle so really with a bumpier shape it might show the differences better log(2) is pretty close to 2

dengwirda commented 1 year ago

Hi @tatarize, thanks for your thoughts here. The current inpoly implementation does indeed use a sorted approach already though --- sorting the points-to-be-tested (along the 'long' axis of the polygon), using a binary search + bbox shortcuts to determine the set of points to test against each edge, and generally returning early from the full crossing-number tests wherever possible. I don't believe it's the same approach you outline above, but --- for an (average-case) problem with N points and M edges --- the overall scaling is O((N+M)*log(N)).

Python is typically quite a poor language to implement these kind of loop-based algorithms, so the inpoly library leverages cython to build an efficient back-end.

If there are faster (in practice) algorithms out there, it'd be great to explore them, though running the bake-off above generates the following for me:

500: scanbeam 1.8576548099517822, inpoly 0.31999850273132324 -- 0.17225939987183583
5000: scanbeam 2.2886641025543213, inpoly 0.3571152687072754 -- 0.1560365578805155
50000: scanbeam 6.131136655807495, inpoly 0.6439507007598877 -- 0.10502957883835927
500000: scanbeam 45.59218692779541, inpoly 3.835197925567627 -- 0.08411963066482094
5000000: scanbeam 440.37712574005127, inpoly 41.76016664505005 -- 0.09482819202943914

and, as you say, this is even with the existing inpoly doing the loop 10x more.

tatarize commented 1 year ago

The sorted approach makes it the overall algorithm sublinear and the lightning fast backend makes it a bit hard to beat (in raw performance, not time complexity). I think the scanbeam there might win somewhere in the hundred trillions. But, purely in time complexity you can do O(log(N)*log(K)) (where K is the average overlap of the M sides). You can precache the overlap search to find the needed edges in constant time, and by adding additional breaks at the self intersections, binary search to find the active edges at a position y. And then binary search to find position within those edges for a value x (since we added breaks at self intersections, no segment will ever exceed another with regard to x within a scanbeam). Thereby reducing the time complexity to a log search for y and a log search for x.

In fact, the above algorithm is a modified version of Martinez-Rueda clipping, as you've reduced the question of insideness to a question of parity of the edges within a sorted list of sorted lists. To do clipping there you need two or more different origins of the edges and to search for all segments which boundary the insideness of both geometry on one side, but not the other. Then chain all those edge back together.

I'll finish up some more implementation requirements (the initial build running the scanline etc is actually a bit more expensive, I not only have to sort the y-values but also identify self-intersecting segments) and try to beat your time, though with the cpython even using numpy it might be hard to do, but it should be trivial to show that it gets the right answer with two calls to numpy.searchsorted() or bisect() which clearly requires a log(n)*log(k) time complexity. -- Though it is somewhat questionable how much more you can eek out of what is essentially log(n).

tatarize commented 1 year ago

...See later code for less failing.

Rewritten in numpy. With equal executions, gets within an order of magnitude of inpoly, in rare tests goes faster. Still using the O(K*log(N)) version. one searchsorted but then calculated all the x-intercepts to sum them.

200: scanbeam 0.09850502014160156, inpoly 0.13092470169067383 -- 1.3291170490851003
2000: scanbeam 0.1839127540588379, inpoly 0.1301274299621582 -- 0.7075497870067644
20000: scanbeam 0.7567081451416016, inpoly 0.6630463600158691 -- 0.8762246901568561
200000: scanbeam 6.381136894226074, inpoly 2.909923553466797 -- 0.4560196093112844
2000000: scanbeam 60.8707754611969, inpoly 42.052160024642944 -- 0.6908431789479962
20000000: scanbeam 647.8858635425568, inpoly 607.0228259563446 -- 0.9369286476436168

With a lenpoly size of 5. And the polygon is just a 5 pointed circle. But, there at the end 93% as fast, but the more important bit is that it's trending towards scanbeam with higher numbers. It's not a great test since the overlap between the active edges matters a lot, and for a circle it's ==2 But, with any linear algorithm this would be a total runaway with bigger numbers.

dengwirda commented 1 year ago

@tatarize I tried your new implementation on the examples that ship with inpoly and it seems there may be some issues. It looks like very incorrect results are returned for the 1st and 2nd cases (does your approach work for multiply-connected polygons?), and somewhat incorrect results are returned for the 3rd case (which is just a simple polygon). It also looks like there may be issues for points lying exactly on edges or vertices of the polygon...

Running the large 3rd case, it seems the scanbeam approach may start to run out of memory(?) for > 10M points, at which stage it's still slower than the current inpoly strategy.

tatarize commented 1 year ago
def build_edge_list(polygon):
    edge_list = []
    for i in range(0, len(polygon) - 1):
        if (polygon[i].imag, polygon[i].real) < (polygon[i + 1].imag, polygon[i + 1].real):
            edge_list.append((polygon[i], i))
            edge_list.append((polygon[i + 1], ~i))
        else:
            edge_list.append((polygon[i], ~i))
            edge_list.append((polygon[i + 1], i))

    def sort_key(e):
        return e[0].imag, e[0].real, ~e[1]

    edge_list.sort(key=sort_key)
    return edge_list

def build_scanbeam(edge_list):
    actives = []
    actives_table = []
    events = []
    y = -float("inf")
    for pt, index in edge_list:
        if y != pt.imag:
            actives_table.append(list(actives))
            events.append(pt.imag)
        if index >= 0:
            actives.append(index)
        else:
            actives.remove(~index)
        y = pt.imag
    actives_table.append(list(actives))
    largest_actives = max([len(a) for a in actives_table])
    scan = np.zeros((len(actives_table), largest_actives), dtype=int)
    scan -= 1
    for i, active in enumerate(actives_table):
        scan[i, 0: len(active)] = active
    return scan, events

def point_in_polygon(polygon, point):
    edge_list = build_edge_list(polygon)
    scan, events = build_scanbeam(edge_list)
    pts_y = np.imag(point)
    idx = np.searchsorted(events, pts_y)
    actives = scan[idx]
    a = polygon[actives]
    b = polygon[actives + 1]

    a = np.where(actives == -1, np.nan + np.nan * 1j, a)
    b = np.where(actives == -1, np.nan + np.nan * 1j, b)

    old_np_seterr = np.seterr(invalid="ignore", divide="ignore")
    try:
        # If horizontal slope is undefined. But, all x-ints are at x since x0=x1
        m = (b.imag - a.imag) / (b.real - a.real)
        y0 = a.imag - (m * a.real)
        ys = np.reshape(np.repeat(np.imag(point), y0.shape[1]), y0.shape)
        x_intercepts = np.where(~np.isinf(m), (ys - y0) / m, a.real)
    finally:
        np.seterr(**old_np_seterr)

    xs = np.reshape(np.repeat(np.real(point), y0.shape[1]), y0.shape)
    results = np.sum(x_intercepts <= xs, axis=1)
    results %= 2
    return results

Should work correctly with the answers, went over it and found the couple bugs there.

    def test_pip_vs_inpoly(self):
        N = 500000
        lenpoly = 50
        circle = True
        if circle:
            polygon = [
                [np.sin(x) + 0.5, np.cos(x) + 0.5]
                for x in np.linspace(0, 2 * np.pi, lenpoly)
            ]
        else:
            polygon = np.random.uniform(-1.5, 1.5, size=(lenpoly, 2)).astype("float32")
            polygon[-1] = polygon[0]  # Closed polygon

        polygon = np.array(polygon, dtype="float32")
        points = np.random.uniform(-3.5, 3.5, size=(N, 2)).astype("float32")

        t = time.time()
        IN, ON = inpoly2(points, polygon)
        time_inpoly = time.time() - t

        # Convert to correct format.
        points = points[:, 0] + points[:, 1] * 1j
        pg = polygon[:, 0] + polygon[:, 1] * 1j

        t = time.time()
        scan_answers = point_in_polygon(pg, points)
        time_scanline = time.time() - t
        for p0, p1 in zip(IN, scan_answers):
            assert bool(p0) == bool(p1)
        print(f"scanline took {time_scanline} seconds. inpoly took {time_inpoly}")

As the test.

Sometimes inpoly ran out of memory first, I had to switch to 64 bit python to allocate enough memory. It does worse with random values rather than circles. This increases the overlap and it's just doing a sum there rather than searchsorted because to make sure the edges are sorted requires inserting events at the intersections so that we can be sure no line crosses another within our scanbeam.

It's really hard to beat the C implementation there, and it likely would certainly require the log(k)log(m) rather than klog(m) per point check.

dengwirda commented 1 year ago

@tatarize thanks for following up on this --- new ideas on fast algorithms are definitely welcome. Running the inpoly examples with the new code above is still showing some correctness issues though, and I feel we can perhaps close this as an issue and think about further contributions as a PR if that's of interest.

While the inpoly examples are simple enough in many cases, they also function as unit tests to establish behaviours that would need to be kept in the mix if you wanted to move ahead with a PR:

  1. Dealing with multiply-connected polygons would need to be part of the design, so that config.'s with various inner and outer contours can be processed.
  2. Algorithms are often sensitive to floating-point round-off, so some subtlety may be required near polygon boundaries. In addition to an inside/outside classification, returning which points lie 'on' boundaries is also part of the current design.
  3. Efficiency is often hard to chase with Python, with it being such a slow language when left to its own devices. It's possible that the current Cython-ised O((M+N)*log(N)) implementation may be a little unfair in this regard, but on the other hand it's the in-practice numbers that win-out in the end :). I'd say the 3rd example is a not-unreasonable test-case in this regard --- classifying a few 100,000 or million points against a complex polygon with O(10,000) edges, which is pretty standard for geospatial-type workflows.

Thanks again for proposing and working through the details here. As we use these algorithms in various operational workflows, ensuring all of the i's are dotted and t's are crossed is something I keep in mind with any updates though, and positive outcomes on the criteria above would need to be established before swapping out the core algorithms here.