cython / cython

The most widely used Python to C compiler
https://cython.org
Apache License 2.0
9.51k stars 1.49k forks source link

[BUG] Memoryview clear unnecessarily acquiring GIL #5670

Closed djhoese closed 1 year ago

djhoese commented 1 year ago

Describe the bug

Full context: https://stackoverflow.com/questions/76972950/cython-returned-memoryview-is-always-considered-uninitialized/76974615

I'm upgrading to Cython 3 and noticing a few new yellow areas in the annotation HTML that make it look like the GIL is being acquired more than it used to and in cases that I believe it is unnecessary.

It has been explained to me that the the function in my code below that returns a memory view is only acquiring the GIL in the error case. I still think with all the compiler flags I have that this shouldn't happen (I know my memoryview is initialized, I know the size of it, I know this is a read-only not-owned subset view of the memory), but I can live with it since it theoretically shouldn't have a significant effect on performance.

However, in the calling function _run I noticed that it is always acquiring the GIL to clear the memory view even in the success case:

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1);
  __Pyx_WriteUnraisable("cython_test._run", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 0);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L0:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif

In profiling my (dask-based) code it is very clear that this is preventing my threads from processing data simultaneously so this one hurts more than the failure case of the other function.

I understand that memoryview handling has changed a lot in Cython 3.x, but Cython 0.29.x did not have this GIL acquire code here and it passed , 0) to the __PYX_X_CLEAR_MEMVIEW function.

Code to reproduce the behaviour:

# cython: language_level=3, boundscheck=False, cdivision=True, wraparound=False, initializedcheck=False, nonecheck=False
cimport cython
from cython cimport floating
import numpy as np
cimport numpy as np

np.import_array()

def test_func():
    cdef np.ndarray[float, ndim=2] arr = np.zeros((5, 5), dtype=np.float32)
    cdef float[:, ::1] arr_view = arr
    _run(arr_view)

cdef void _run(floating[:, ::1] arr_view) noexcept nogil:
    cdef floating[:, :] tmp = _get_upper_left_corner(arr_view)

cdef inline floating[:, :] _get_upper_left_corner(floating[:, ::1] arr) noexcept nogil:
    return arr[:1, :1]

Expected behaviour

Minimal (no?) GIL acquiring when working with memoryviews.

OS

Linux

Python version

3.11

Cython version

3.0.0

Additional context

Installed from conda-forge.

djhoese commented 1 year ago

I also wanted to mention that if this isn't intentional, but there is no time to figure out why this is happening I'm willing to track it down if someone could just point me in the right direction on where to look in the code base.

da-woods commented 1 year ago

I'm pretty sure it isn't intentional (at least in the success path - the error path needs the GIL for writeunraisable so may as well have it for the whole thing).

My usual approach to working out where to look is to add a conditional break point in CCodeWriter.write - conditional on the string you think this wrong (probably __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1); and this'll take you to where the code is generated. You'll probably need to step up several levels until you arrive in Nodes.py and that's probably the right place.

matusvalo commented 1 year ago

My usual approach to working out where to look is to add a conditional break point in CCodeWriter.write - conditional on the string you think this wrong (probably PYX_XCLEAR_MEMVIEW(&pyx_v_tmp, 1); and this'll take you to where the code is generated. You'll probably need to step up several levels until you arrive in Nodes.py and that's probably the right place.

I was able to dig out the code. So the following C line:

__PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1);

Is generated via following stacktrace:

  /Users/matus/dev/cython39/bin/cython(33)<module>()
-> sys.exit(load_entry_point('Cython', 'console_scripts', 'cython')())
  /Users/matus/dev/cython/Cython/Compiler/Main.py(757)setuptools_main()
-> return main(command_line = 1)
  /Users/matus/dev/cython/Cython/Compiler/Main.py(785)main()
-> result = compile(sources, options)
  /Users/matus/dev/cython/Cython/Compiler/Main.py(676)compile()
-> return compile_multiple(source, options)
  /Users/matus/dev/cython/Cython/Compiler/Main.py(650)compile_multiple()
-> result = run_pipeline(source, options,
  /Users/matus/dev/cython/Cython/Compiler/Main.py(542)run_pipeline()
-> err, enddata = Pipeline.run_pipeline(pipeline, source)
  /Users/matus/dev/cython/Cython/Compiler/Pipeline.py(398)run_pipeline()
-> data = run(phase, data)
  /Users/matus/dev/cython/Cython/Compiler/Pipeline.py(375)run()
-> return phase(data)
  /Users/matus/dev/cython/Cython/Compiler/Pipeline.py(52)generate_pyx_code_stage()
-> module_node.process_implementation(options, result)
  /Users/matus/dev/cython/Cython/Compiler/ModuleNode.py(222)process_implementation()
-> self.generate_c_code(env, options, result)
  /Users/matus/dev/cython/Cython/Compiler/ModuleNode.py(516)generate_c_code()
-> self.body.generate_function_definitions(env, code)
  /Users/matus/dev/cython/Cython/Compiler/Nodes.py(404)generate_function_definitions()
-> stat.generate_function_definitions(env, code)
  /Users/matus/dev/cython/Cython/Compiler/Nodes.py(404)generate_function_definitions()
-> stat.generate_function_definitions(env, code)
  /Users/matus/dev/cython/Cython/Compiler/FusedNode.py(964)generate_function_definitions()
-> stat.generate_function_definitions(env, code)
  /Users/matus/dev/cython/Cython/Compiler/Nodes.py(2368)generate_function_definitions()
-> code.put_var_xdecref(entry, have_gil=gil_owned['success'])
  /Users/matus/dev/cython/Cython/Compiler/Code.py(2260)put_var_xdecref()
-> self.put_xdecref(entry.cname, entry.type, **kwds)
  /Users/matus/dev/cython/Cython/Compiler/Code.py(2215)put_xdecref()
-> type.generate_xdecref(self, cname, nanny=nanny, have_gil=have_gil)
> /Users/matus/dev/cython/Cython/Compiler/PyrexTypes.py(1138)generate_xdecref()
-> code.putln("__PYX_XCLEAR_MEMVIEW(&%s, %d);" % (cname, int(have_gil)))

So the following GIL lock code:

  __pyx_L1_error:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1);
  __Pyx_WriteUnraisable("test._run", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 0);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L0:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif

is generated here:

https://github.com/cython/cython/blob/36a741a10db8114c1eca1753aea7be92c5f3a639/Cython/Compiler/Nodes.py#L2363-L2368

where by deleting assure_gil('success') the GIL code is avoided and the generated C code looks as follows:

  __pyx_L1_error:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1);
  __Pyx_WriteUnraisable("test._run", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 0);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L0:;
  __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 0);

But I have a several questions:

  1. The GIL locking is necessary for __PYX_XCLEAR_MEMVIEW or not?
  2. What is the purpose of WITH_THREAD macro? I don't see it anywhere declared (even not during compilation) so it seems that locking is not compiled in any case...
da-woods commented 1 year ago
  1. The GIL locking is necessary for __PYX_XCLEAR_MEMVIEW or not?

No - __PYX_XCLEAR_MEMVIEW will reacquire the GIL itself if necessary, which is only when its internal atomic reference count goes to 0. The second argument (0 or 1) tells it whether it already has the GIL.

However - note that if entry.type.needs_refcounting: covers multiple different types (currently just PyObject + memoryview, but maybe more in future) and only memoryview can get away without the GIL.

  1. What is the purpose of WITH_THREAD macro? I don't see it anywhere declared (even not during compilation) so it seems that locking is not compiled in any case...

It's defined in the Python headers (https://github.com/python/cpython/blob/d48760b2f1e28dd3c1a35721939f400a8ab619b8/Include/pyport.h#L663). It looks like it's pointless now since it's always defined, but before Python 3.7 it used to be informative.