amueller / word_cloud

A little word cloud generator in Python
https://amueller.github.io/word_cloud
MIT License
10.1k stars 2.31k forks source link

Maybe get rid of Cython #741

Open amueller opened 10 months ago

amueller commented 10 months ago

It would be really nice to get rid of the Cython code. The equivalent in numpy is

def query_integral_image(integral_image, size_x, size_y, random_state):
    x, y = integral_image.shape

    # calculate all areas
    i, j = np.ogrid[:x-size_x, :y-size_y]
    area = integral_image[i, j] + integral_image[i+size_x, j+size_y] - integral_image[i+size_x, j] - integral_image[i, j+size_y]
    zero_areas = np.argwhere(area == 0)

    if zero_areas.size == 0:
        # no room left
        return None

    # pick a location at random
    goal = random_state.choice(zero_areas.shape[0])
    return tuple(zero_areas[goal])

however, that seems to be much slower. Not sure if there's a faster way to do it. cc @thomasjpfan

thomasjpfan commented 10 months ago

Here is the best I can do with pure NumPy:

def query_integral_image_np(integral_image, size_x, size_y, random_state):
    area = integral_image[:-size_x, :-size_y]
    area += integral_image[size_x:,size_y:]
    area -= integral_image[size_x:, :-size_y]
    area -= integral_image[:-size_x, size_y:]
    flat_area_zero = np.flatnonzero(area == 0)
    hits = flat_area_zero.shape[0]

    if hits == 0:
        # no room left
        return None

    goal = random_state.randint(0, hits)
    flat_idx = flat_area_zero[goal]

    w, h = integral_image.shape
    row = flat_idx // w
    col = flat_idx - row * w
    return row, col

The Cython code still is ~ 2.6 times faster than this.

ccbogel commented 10 months ago

@thomasjpfan I kept getting an IndexError:

flat_idx = flat_area_zero[goal]

IndexError: index 635 is out of bounds for axis 0 with size 600

amueller commented 10 months ago

@thomasjpfan hm if there's now a way to do version-independent binary wheels maybe that's the better way forward? Also clearly your numpy skills are better than GPT4 ;) But maybe 2.6 times is fine also given the maintenance overhead?

thomasjpfan commented 10 months ago

@amueller I tried using ABI3 with word_cloud, but it will not work until the minimum support Python version is 3.11. query_integral_image uses the Buffer protocol, which became stable in 3.11: https://docs.python.org/3/c-api/buffer.html#buffer-structure

Although even with 3.11 as minimum version, Cython still uses non-stable CAPI calls. I suspect Cython's limited ABI support is not mature enough yet.

But maybe 2.6 times is fine also given the maintenance overhead?

I cannot tell without running benchmarks to see how much slower generate_from_frequencies gets.