Closed tleppala closed 2 years ago
Seems like there's a nasty nested loop in Measurement.save_cuts
where the innermost functions (Selection.get_points()
and math_functions.point_inside_polygon()
) are slow.
Measurement.save_cuts()
# Go through all points in measurement data
data_count = len(self.data)
for n in range(data_count): # while n < data_count:
if n % 5000 == 0:
# Do not always update UI to make it faster.
if progress is not None:
progress.report(n / data_count * 80)
point = self.data[n]
# Check if point is within selectors' limits for faster processing.
if not self.selector.axes_limits.is_inside(point):
continue
for i, selection in enumerate(self.selector.selections):
if selection.point_inside(point):
points_in_selection[i].append(point)
Selection.point_inside()
def point_inside(self, point):
"""Check if point is inside selection.
Args:
point: [X, Y] representing a point.
Return:
Returns True if point is within selection. False otherwise.
"""
if not self.axes_limits.is_inside(point):
return False
inside = mf.point_inside_polygon((point[0], point[1]),
self.get_points())
# While at it, increase event point counts if not counted already.
if inside and not self.events_counted:
self.event_count += 1
return inside
Selection.get_points()
def get_points(self):
"""Get points in selection
Get points in selection in list. Format: ((x1,y1), (x2,y2), ...).
If no points, empty list is returned
Return:
((x1, y1), (x2, y2), ...)
"""
points = self.points.get_data()
pointlist = []
for i in range(len(points[0])):
pointlist.append([points[0][i], points[1][i]])
return pointlist
math_functions.point_inside_polygon()
def point_inside_polygon(point, poly):
"""Finds out if a point x, y is inside a polygon "poly"
Determine if a point is inside a given polygon or not
Polygon is a list of (x,y) pairs.
Algorithm got from: http://www.ariel.com.au/a/python-point-int-poly.html
"""
# TODO test using shapely or numpy for this
x, y = point
n = len(poly)
inside = False
p1x, p1y = poly[0]
for i in range(n + 1):
p2x, p2y = poly[i % n]
if y > min(p1y, p2y):
if y <= max(p1y, p2y):
if x <= max(p1x, p2x):
if p1y != p2y:
xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + \
p1x
if p1x == p2x or x <= xinters:
inside = not inside
p1x, p1y = p2x, p2y
return inside
There's a TODO for # TODO test using shapely or numpy for this
in math_functions.point_inside_polygon()
, but I think Selection.get_points()
is a problem too. This could be faster if the Selection.get_points()
was cached with lru_cache, or maybe just called once for each selection in Measurement.save_cuts()
and then passed to Selection.point_inside()
. We could also save the Selection.get_points()
in the Selection
object and update it when it changes (or when it's needed), but that has the potential for cache invalidation issues.
Oh, and profiling should reveal the real culprit. My last message is just an educated guess.
most time is spend: total 22s (tottime, cumtime) 5s 14.8s selection.py: point_inside 4.2s 19s selection.py: update_selection_points 3.2s 4.4s math_functions.py: point_inside_polygon 2.3s 3.5s selections.py: get_points 1.9s 1.9s selections.py: is_inside
Using arrays for max_y and min_y (8k elements each) with one comparison (using points x coordinate as index) should be equal to checking x_min and x_max limits (and also y_min or y_max). and second comparison should give if point is inside selection or not. This method should be faster then even is_inside method (4 comparison and nested ifs). Additions to selection objects should include the y_min and y_max arrays and method to compute them when selection is made or loaded. After this checking single point is just two comparison away (point_y < y_max[point_x] and point_y > y_min[point_x]). default values for y_max: -1 and y_min: max_y_value +1. As all data is integer valued with specified ranges, there is no need to made use of polygons in comparison of points to selections. Also this method is well suited for other then straight segmented polygons.
Also if using max_x and min_x arrays is feasible that would require only 4k arrays
Your proposed algorithm seems reasonable, but I'm worried about its limitations. While normal L-shaped(ish) selections are alright, if a selection is C-shaped or similar, it wouldn't work. This kind of limitation is difficult to convey to users. Maybe this could work with the split selection workaround, but sounds like that would get complicated fast.
Looks like there's currently quite a bit of function call overhead involved. Selection.point_inside()
takes 5s, even though the method itself doesn't look computationally intensive (assuming I'm reading the profiling results right and that 5s doesn't include the inner function calls). Maybe something like this would reduce the overhead enough without changing the used algorithm:
# Formerly point_inside(self, point)
def points_inside(self, points):
polygon = self.get_points()
matching_points = []
for point in points:
# This is checked in save_cuts too, remove it from there?
if not self.axes_limits.is_inside(point):
continue
if mf.point_inside_polygon((point[0], point[1]), polygon):
matching_points.append(point)
if not self.events_counted:
self.event_count += 1
return matching_points
This would need to be called len(self.selector.selections)
times, instead of len(self.data) * len(self.selector.selections)
times.
Multithreading could also help, if we aren't using it already.
tottime, cumtime 4.2s 10.1s formerly_points_inside
Multithreading might not be as helpful in this case. Better option would be to use multithreading (but GIL). With multiprocessing copying data to new process might take some time (or maybe split the data to multiple processes at beginning and just ask for different selections) and might be more work then it's worth,
Using arrays with difficult shapes might get a tad interesting, but the difficult part is only in creating divisions, but minimal effect to computational part.
I'm also wondering if prescreening points with is_inside method is really necessary, because points_inside_polygon screens them yet again... only min_x screening is not done again but should be easy addition.
Other thing that should help quite a bit in any case, would be to make fewer function calls... With some 300k data points top 10 (tottime) functions have some 22,5 million function calls.
Multithreading might not be as helpful in this case. Better option would be to use multithreading (but GIL). With multiprocessing copying data to new process might take some time (or maybe split the data to multiple processes at beginning and just ask for different selections) and might be more work then it's worth,
I think per-selection processes/threads are for the best. Going smaller than that is not useful because there are usually several selections anyway. Multiprocessing/threading is easy to implement with ThreadPoolExecutor
or ProcessPoolExecutor
in Measurement.save_cuts()
:
# with concurrent.futures.ProcessPoolExecutor() as executor: # Choose this for processes
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(s.points_inside, points) for s in self.selector.selections]
points_in_selection = [future.result() for future in futures]
I'm also wondering if prescreening points with is_inside method is really necessary, because points_inside_polygon screens them yet again... only min_x screening is not done again but should be easy addition.
Yeah, that may be redundant.
Other thing that should help quite a bit in any case, would be to make fewer function calls... With some 300k data points top 10 (tottime) functions have some 22,5 million function calls.
Agreed. It's faster to process entire arrays (of some kind) instead of individual elements.
We could also look into using faster runtimes like Numba, NumPy ufuncs or ctypes. However, these make the build process more complicated and they are harder to debug/write, so they are more of a last resort. They are good options to keep in mind though, in case there are enough slow parts in the code to speed up.
Here is quick working version of best of both worlds, using original algorithm with cached points. 1.4s tottime, 1.6s cumtime. And as predicted, faster then just the selection:is_inside code (1.8s).
def tl_calculate_values(self, poly):
x_list = []
x_max_list = []
for i in range(8192): # max size of list
x_list.append([])
x_max_list.append(-1)
n = len(poly)
p1x, p1y = poly[0]
for i in range(n + 1):
p2x, p2y = poly[i % n]
if p1y != p2y:
for i in range(math.ceil(min(p1y, p2y)), math.floor(max(p1y, p2y)) + 1):
xinters = (i - p1y) * (p2x - p1x) / (p2y - p1y) + p1x
if x_list[i] == [9999]:
x_list[i] = [xinters]
else:
x_list[i].append(xinters)
p1x, p1y = p2x, p2y
for i in range(len(x_list)):
if x_list[i] != []:
x_list[i].sort(reverse=True)
x_max_list[i] = x_list[i][0]
if x_list[i] == []:
x_list[i] = [9999]
return x_list, x_max_list
def tl_points_inside(self, points):
if self.cached_points != self.get_points():
self.cached_points = self.get_points()
self.tl_cached_x, self.tl_cached_x_max = self.tl_calculate_values(self.cached_points)
points_inside = []
for point in points:
inside = False
if point[0] <= self.tl_cached_x_max[point[1]]:
for i in range(len(self.tl_cached_x[point[1]])):
if point[0] > self.tl_cached_x[point[1]][i]:
inside = not inside
if inside:
points_inside.append(point)
return points_inside
Little more tuning: tottime 0s and cumtime 0.9s (0.76s without tl_calculate_values time)
def tl_points_inside(self, points):
if self.cached_points != self.get_points():
self.cached_points = list(self.get_points())
self.tl_cached_x, self.tl_cached_x_max = self.tl_calculate_values(self.cached_points)
points_inside = [row for row in points if (row[0] < self.tl_cached_x_max[row[1]]) and
len([x for x in self.tl_cached_x[row[1]] if x < row[0]])%2]
return points_inside
Next maybe some testing agains original algorithm etc
I saw some discussion already on this, but I just wanted to make sure that concave polygons should be supported, even if rarely used in practice. Concave polygons can be split to convex polygons, or detected and then falling back to a slower algorithm.
I have used the same "point in polygon" algorithm in my C programs and I have found it to be fast. I guess Python adds a bit overhead everywhere and this all slows it down.
Also algorithms that don't rely data to be discrete (integers) may have been originally avoided intentionally, to keep support for original list data in floating point (e.g. data is preprocessed by some other software). Otherwise I think this code is some of the oldest in Potku and should be shown no mercy when rewriting :)
Main idea in this rewrite of the algorithm is to precompute intersection points for all applicable y values (max. x value for fast culling of points below, above and right of polygon. Next list of x values is used to find how many intersection points are left of x value of interest (odd = inside). Basically algorithm now makes all computations for each polygon beforehand and just use values in table.
Algorithm should be quite fast in C, but each intersection point for each leg of polygon have to be computed (or at least test if they need to be computed.
point_in_polygon is called one per tested point per selection (after is_inside-test). This amounts to some 450k test (for 330k points, 13 selections). Also 4.2 million calls to point_inside is made. Function calls in python are costly (https://www.futurelearn.com/info/courses/python-in-hpc/0/steps/65124). is_inside would probably be some what faster by just using AND and not multiple IFs (AND are short circuited in python). Well... Lets test this: Current solution: 4.2 million calls, 0.9s, and-version: 0.8s. With arranging terms better 0.7s. Not too much, but 28% speedup is nice.
changes implemented in pull request #271
Generic algorithm is used to check if point is inside selection polygon. Algorithm is called about num_points x selections times when points in cuts are tested (also algorithm tests for each side of polygon). As data is discrete with 8k steps, more efficient algorithm might be to convert selection polygons to max_y, min_y arrays. By doing so testing if point is inside selection requires at most 2 comparisons (using full arrays). Limitations using this method is there can't be multiple regions in single x (or y if other direction is used) position. If this is deemed to be problem method can be refined for such eventualities (maybe split affected selections to multiple pieces to remove problem areas). Also this conversion is requires to be made only once.