PyWavelets / pywt

PyWavelets - Wavelet Transforms in Python
http://pywavelets.readthedocs.org
MIT License
1.97k stars 460 forks source link

CI: Add CI to test on free-threaded Python #753

Closed andfoy closed 6 days ago

andfoy commented 1 week ago

This PR introduces a CI to run the test suite on Linux against the free-threaded distribution of Python 3.13. This effort follows the recently added testing and wheel distributions of NumPy and SciPy.

andfoy commented 1 week ago

This is passing now on my fork: https://github.com/andfoy/pywt/actions/runs/9590437998

rgommers commented 6 days ago

I pushed a couple of tiny cleanups to make the diff even smaller.

andfoy commented 6 days ago

After a more detailed exploration, I compared the output C files produced by Cython under Python 3.12 and 3.13. In particular, the issue comes from the following trace:

  1. A ndarray strides are obtained by calling PyArray_STRIDES on an array cA. https://github.com/PyWavelets/pywt/blob/b63ff616f2e39d5780961eb15f9efebe3f41729a/pywt/_extensions/_swt.pyx#L204 For simplification process, lets call this variable raw_strides:
    __pyx_v_raw_strides = ((npy_intp *)__pyx_f_5numpy_7ndarray_7strides_strides(__pyx_v_cA));
  2. Then, cA is redefined again by calling empty: https://github.com/PyWavelets/pywt/blob/b63ff616f2e39d5780961eb15f9efebe3f41729a/pywt/_extensions/_swt.pyx#L251

Such redefinition implies the creation of an intermediate variable that contains the array, which is then assigned to the original cA variable by means of Py_DECREF:

__Pyx_DECREF_SET(__pyx_v_cA, ((PyArrayObject *)__pyx_t_15));

On Python 3.12, calling Py_DECREF, for some reason, does not affect the values pointed by __pyx_v_raw_strides, which means that the memory allocated to store the strides (or shape) is not being released when an array is redefined.

The bug (or feature) that we are now experiencing on Python 3.13 is that, when the original array object is deleted (as its refcount hits zero), its attributes also are deleted. In this case, since PyArray_STRIDES yields a pointer to the strides values in memory and not a copy, once cA goes out of scope, the values pointed to any of the strides and shape attributes are being released with the object, which was different from the behaviour experienced in 3.12.

The main question here is, should any access to the attributes of an object imply an increase of the refcount, or is this behaviour expected and it should be taken care of explicitly when writing Cython code?

It is important to mention that this happens regardless of the setting of the PYTHON_GIL environment variable.

cc @lysnikolaou @da-woods

da-woods commented 6 days ago

I suspect this change is because the memory allocator got changed in Python 3.13 (if nothing else uses the memory that strides points to then you may get away with accessing it, even if it's technically not allowed).

I think you need to account for the reference counting explicitly. As far as Cython is concerned strides and shape are just arbitrary int pointers and it knows nothing about the underlying ownership.

rgommers commented 6 days ago

Thanks for digging in @andfoy, and thanks for confirming @da-woods! Then this code was always technically buggy, we were just lucky that it worked until now.

I'll have a last look at this, and then we'll get this in.

rgommers commented 6 days ago

Merged, nice work @andfoy. Looks like the next step is to start uploading Linux and macOS wheels - would you be able to open a new PR for that?