JYU-IBA / potku

Potku is analysis and simulation software for ToF-ERD measurements
https://www.jyu.fi/science/en/physics/research/infrastructures/accelerator-laboratory/pelletron/potku/
GNU General Public License v2.0
7 stars 7 forks source link

Slow implementation of inside of polygon algorithm #259

Closed tleppala closed 2 years ago

tleppala commented 2 years ago

Generic algorithm is used to check if point is inside selection polygon. Algorithm is called about num_points x selections times when points in cuts are tested (also algorithm tests for each side of polygon). As data is discrete with 8k steps, more efficient algorithm might be to convert selection polygons to max_y, min_y arrays. By doing so testing if point is inside selection requires at most 2 comparisons (using full arrays). Limitations using this method is there can't be multiple regions in single x (or y if other direction is used) position. If this is deemed to be problem method can be refined for such eventualities (maybe split affected selections to multiple pieces to remove problem areas). Also this conversion is requires to be made only once.

tpitkanen commented 2 years ago

Seems like there's a nasty nested loop in Measurement.save_cuts where the innermost functions (Selection.get_points() and math_functions.point_inside_polygon()) are slow.

Measurement.save_cuts()

        # Go through all points in measurement data
        data_count = len(self.data)
        for n in range(data_count):  # while n < data_count: 
            if n % 5000 == 0:
                # Do not always update UI to make it faster.
                if progress is not None:
                    progress.report(n / data_count * 80)
            point = self.data[n]
            # Check if point is within selectors' limits for faster processing.
            if not self.selector.axes_limits.is_inside(point):
                continue

            for i, selection in enumerate(self.selector.selections):
                if selection.point_inside(point):
                    points_in_selection[i].append(point)

Selection.point_inside()

    def point_inside(self, point):
        """Check if point is inside selection.

        Args:
            point: [X, Y] representing a point.

        Return:
            Returns True if point is within selection. False otherwise.
        """
        if not self.axes_limits.is_inside(point):
            return False
        inside = mf.point_inside_polygon((point[0], point[1]),
                                         self.get_points())
        # While at it, increase event point counts if not counted already.
        if inside and not self.events_counted:
            self.event_count += 1
        return inside

Selection.get_points()

    def get_points(self):
        """Get points in selection

        Get points in selection in list. Format: ((x1,y1), (x2,y2), ...).
        If no points, empty list is returned

        Return:
           ((x1, y1), (x2, y2), ...)
        """
        points = self.points.get_data()
        pointlist = []
        for i in range(len(points[0])):
            pointlist.append([points[0][i], points[1][i]])
        return pointlist

math_functions.point_inside_polygon()

def point_inside_polygon(point, poly):
    """Finds out if a point x, y is inside a polygon "poly"

    Determine if a point is inside a given polygon or not
    Polygon is a list of (x,y) pairs.

    Algorithm got from: http://www.ariel.com.au/a/python-point-int-poly.html
    """
    # TODO test using shapely or numpy for this
    x, y = point
    n = len(poly)
    inside = False

    p1x, p1y = poly[0]
    for i in range(n + 1):
        p2x, p2y = poly[i % n]
        if y > min(p1y, p2y):
            if y <= max(p1y, p2y):
                if x <= max(p1x, p2x):
                    if p1y != p2y:
                        xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + \
                                  p1x
                    if p1x == p2x or x <= xinters:
                        inside = not inside
        p1x, p1y = p2x, p2y
    return inside

There's a TODO for # TODO test using shapely or numpy for this in math_functions.point_inside_polygon(), but I think Selection.get_points() is a problem too. This could be faster if the Selection.get_points() was cached with lru_cache, or maybe just called once for each selection in Measurement.save_cuts() and then passed to Selection.point_inside(). We could also save the Selection.get_points() in the Selection object and update it when it changes (or when it's needed), but that has the potential for cache invalidation issues.

tpitkanen commented 2 years ago

Oh, and profiling should reveal the real culprit. My last message is just an educated guess.

tleppala commented 2 years ago

most time is spend: total 22s (tottime, cumtime) 5s 14.8s selection.py: point_inside 4.2s 19s selection.py: update_selection_points 3.2s 4.4s math_functions.py: point_inside_polygon 2.3s 3.5s selections.py: get_points 1.9s 1.9s selections.py: is_inside

Using arrays for max_y and min_y (8k elements each) with one comparison (using points x coordinate as index) should be equal to checking x_min and x_max limits (and also y_min or y_max). and second comparison should give if point is inside selection or not. This method should be faster then even is_inside method (4 comparison and nested ifs). Additions to selection objects should include the y_min and y_max arrays and method to compute them when selection is made or loaded. After this checking single point is just two comparison away (point_y < y_max[point_x] and point_y > y_min[point_x]). default values for y_max: -1 and y_min: max_y_value +1. As all data is integer valued with specified ranges, there is no need to made use of polygons in comparison of points to selections. Also this method is well suited for other then straight segmented polygons.

Also if using max_x and min_x arrays is feasible that would require only 4k arrays

tpitkanen commented 2 years ago

Your proposed algorithm seems reasonable, but I'm worried about its limitations. While normal L-shaped(ish) selections are alright, if a selection is C-shaped or similar, it wouldn't work. This kind of limitation is difficult to convey to users. Maybe this could work with the split selection workaround, but sounds like that would get complicated fast.

Looks like there's currently quite a bit of function call overhead involved. Selection.point_inside() takes 5s, even though the method itself doesn't look computationally intensive (assuming I'm reading the profiling results right and that 5s doesn't include the inner function calls). Maybe something like this would reduce the overhead enough without changing the used algorithm:

    # Formerly point_inside(self, point)
    def points_inside(self, points):
        polygon = self.get_points()
        matching_points = []

        for point in points:
            # This is checked in save_cuts too, remove it from there?
            if not self.axes_limits.is_inside(point):
                continue
            if mf.point_inside_polygon((point[0], point[1]), polygon):
                matching_points.append(point)
                if not self.events_counted:
                    self.event_count += 1

        return matching_points

This would need to be called len(self.selector.selections) times, instead of len(self.data) * len(self.selector.selections) times.

Multithreading could also help, if we aren't using it already.

tleppala commented 2 years ago

tottime, cumtime 4.2s 10.1s formerly_points_inside

Multithreading might not be as helpful in this case. Better option would be to use multithreading (but GIL). With multiprocessing copying data to new process might take some time (or maybe split the data to multiple processes at beginning and just ask for different selections) and might be more work then it's worth,

Using arrays with difficult shapes might get a tad interesting, but the difficult part is only in creating divisions, but minimal effect to computational part.

I'm also wondering if prescreening points with is_inside method is really necessary, because points_inside_polygon screens them yet again... only min_x screening is not done again but should be easy addition.

Other thing that should help quite a bit in any case, would be to make fewer function calls... With some 300k data points top 10 (tottime) functions have some 22,5 million function calls.

tpitkanen commented 2 years ago

Multithreading might not be as helpful in this case. Better option would be to use multithreading (but GIL). With multiprocessing copying data to new process might take some time (or maybe split the data to multiple processes at beginning and just ask for different selections) and might be more work then it's worth,

I think per-selection processes/threads are for the best. Going smaller than that is not useful because there are usually several selections anyway. Multiprocessing/threading is easy to implement with ThreadPoolExecutor or ProcessPoolExecutor in Measurement.save_cuts():

# with concurrent.futures.ProcessPoolExecutor() as executor:  # Choose this for processes
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(s.points_inside, points) for s in self.selector.selections]
    points_in_selection = [future.result() for future in futures]

I'm also wondering if prescreening points with is_inside method is really necessary, because points_inside_polygon screens them yet again... only min_x screening is not done again but should be easy addition.

Yeah, that may be redundant.

Other thing that should help quite a bit in any case, would be to make fewer function calls... With some 300k data points top 10 (tottime) functions have some 22,5 million function calls.

Agreed. It's faster to process entire arrays (of some kind) instead of individual elements.

We could also look into using faster runtimes like Numba, NumPy ufuncs or ctypes. However, these make the build process more complicated and they are harder to debug/write, so they are more of a last resort. They are good options to keep in mind though, in case there are enough slow parts in the code to speed up.

tleppala commented 2 years ago

Here is quick working version of best of both worlds, using original algorithm with cached points. 1.4s tottime, 1.6s cumtime. And as predicted, faster then just the selection:is_inside code (1.8s).

   def tl_calculate_values(self, poly):
        x_list = []
        x_max_list = []
        for i in range(8192):  # max size of list
            x_list.append([])
            x_max_list.append(-1)
        n = len(poly)
        p1x, p1y = poly[0]
        for i in range(n + 1):
            p2x, p2y = poly[i % n]
            if p1y != p2y:
                for i in range(math.ceil(min(p1y, p2y)), math.floor(max(p1y, p2y)) + 1):
                    xinters = (i - p1y) * (p2x - p1x) / (p2y - p1y) + p1x
                    if x_list[i] == [9999]:
                        x_list[i] = [xinters]
                    else:
                        x_list[i].append(xinters)
            p1x, p1y = p2x, p2y
        for i in range(len(x_list)):
            if x_list[i] != []:
                x_list[i].sort(reverse=True)
                x_max_list[i] = x_list[i][0]
            if x_list[i] == []:
                x_list[i] = [9999]
        return x_list, x_max_list

    def tl_points_inside(self, points):
        if self.cached_points != self.get_points():
            self.cached_points = self.get_points()
            self.tl_cached_x, self.tl_cached_x_max = self.tl_calculate_values(self.cached_points)
        points_inside = []
        for point in points:
            inside = False
            if point[0] <= self.tl_cached_x_max[point[1]]:
                for i in range(len(self.tl_cached_x[point[1]])):
                    if point[0] > self.tl_cached_x[point[1]][i]:
                        inside = not inside
                if inside:
                    points_inside.append(point)
        return points_inside
tleppala commented 2 years ago

Little more tuning: tottime 0s and cumtime 0.9s (0.76s without tl_calculate_values time)

    def tl_points_inside(self, points):
        if self.cached_points != self.get_points():
            self.cached_points = list(self.get_points())
            self.tl_cached_x, self.tl_cached_x_max = self.tl_calculate_values(self.cached_points)
        points_inside = [row for row in points if (row[0] < self.tl_cached_x_max[row[1]]) and
                            len([x for x in self.tl_cached_x[row[1]] if  x < row[0]])%2]
        return points_inside

Next maybe some testing agains original algorithm etc

jaakkojulin commented 2 years ago

I saw some discussion already on this, but I just wanted to make sure that concave polygons should be supported, even if rarely used in practice. Concave polygons can be split to convex polygons, or detected and then falling back to a slower algorithm.

I have used the same "point in polygon" algorithm in my C programs and I have found it to be fast. I guess Python adds a bit overhead everywhere and this all slows it down.

Also algorithms that don't rely data to be discrete (integers) may have been originally avoided intentionally, to keep support for original list data in floating point (e.g. data is preprocessed by some other software). Otherwise I think this code is some of the oldest in Potku and should be shown no mercy when rewriting :)

tleppala commented 2 years ago

Main idea in this rewrite of the algorithm is to precompute intersection points for all applicable y values (max. x value for fast culling of points below, above and right of polygon. Next list of x values is used to find how many intersection points are left of x value of interest (odd = inside). Basically algorithm now makes all computations for each polygon beforehand and just use values in table.

Algorithm should be quite fast in C, but each intersection point for each leg of polygon have to be computed (or at least test if they need to be computed.

point_in_polygon is called one per tested point per selection (after is_inside-test). This amounts to some 450k test (for 330k points, 13 selections). Also 4.2 million calls to point_inside is made. Function calls in python are costly (https://www.futurelearn.com/info/courses/python-in-hpc/0/steps/65124). is_inside would probably be some what faster by just using AND and not multiple IFs (AND are short circuited in python). Well... Lets test this: Current solution: 4.2 million calls, 0.9s, and-version: 0.8s. With arranging terms better 0.7s. Not too much, but 28% speedup is nice.

tleppala commented 2 years ago

changes implemented in pull request #271