Open amueller opened 10 months ago
Here is the best I can do with pure NumPy:
def query_integral_image_np(integral_image, size_x, size_y, random_state):
area = integral_image[:-size_x, :-size_y]
area += integral_image[size_x:,size_y:]
area -= integral_image[size_x:, :-size_y]
area -= integral_image[:-size_x, size_y:]
flat_area_zero = np.flatnonzero(area == 0)
hits = flat_area_zero.shape[0]
if hits == 0:
# no room left
return None
goal = random_state.randint(0, hits)
flat_idx = flat_area_zero[goal]
w, h = integral_image.shape
row = flat_idx // w
col = flat_idx - row * w
return row, col
The Cython code still is ~ 2.6 times faster than this.
@thomasjpfan I kept getting an IndexError:
flat_idx = flat_area_zero[goal]
IndexError: index 635 is out of bounds for axis 0 with size 600
@thomasjpfan hm if there's now a way to do version-independent binary wheels maybe that's the better way forward? Also clearly your numpy skills are better than GPT4 ;) But maybe 2.6 times is fine also given the maintenance overhead?
@amueller I tried using ABI3 with word_cloud
, but it will not work until the minimum support Python version is 3.11. query_integral_image
uses the Buffer protocol, which became stable in 3.11: https://docs.python.org/3/c-api/buffer.html#buffer-structure
Although even with 3.11 as minimum version, Cython still uses non-stable CAPI calls. I suspect Cython's limited ABI support is not mature enough yet.
But maybe 2.6 times is fine also given the maintenance overhead?
I cannot tell without running benchmarks to see how much slower generate_from_frequencies
gets.
It would be really nice to get rid of the Cython code. The equivalent in numpy is
however, that seems to be much slower. Not sure if there's a faster way to do it. cc @thomasjpfan