Open paultiq opened 4 days ago
There was a short discussion on the user's group last week https://groups.google.com/g/cython-users/c/O0nYBTTwb_Y. Only a very small amount has changed since then.
In summary, you can run Python in a way where it ignores the modules' declared compatibility. If you build your cython module with the module state flag enabled then its marginally possible that it might work. I don't think we've currently tested it or even put any thought into how to test it.
I expect there to be some pretty serious limitations even when finished. For example, with gil:
is something that I expect can never be made to work.
Thank you for the prompt reply and awesome work.
I get that user-defined global variables will introduce some indeterminable issues and footguns. That seems inevitable, but also in our control.
My question, iiuc, is really about CYTHON_USE_MODULE_STATE and two-phase init. You said it's a "long term project"... does that suggest it's a long way from being even testable?
(and, if this is the wrong forum, I'm happy to just reply in the user group thread)
CYTHON_USE_MODULE_STATE and two-phase init.
I don't think you necessarily need that - I think single phase init might well work if you turn off the check that bans it. I certainly wouldn't dismiss trying it in favour of waiting.
But in terms of two-phase init, we've done some restructuring so most data (except global cdef
variables right now) is stored in a "module state struct" (in any compilation mode... There's just a single global variable instance of the struct most of the time). But the one thing we haven't done is implemented any of the module lookup mechanisms.
So I can absolutely promise that won't work right now.
Thanks, did a little testing with a cythonized module to start, and my computer didn't blow up. Just wanted to document the test case here for other people and how to disable the multi_interp_extensions_check.
After recompiling with CYTHON_USE_MODULE_STATE = 1, ran the following code and results were correct.
Both fib, and a function using a global variable (a list), worked as expected. The global variable was only updated within the scope of the subinterpreter (each subinterpreter had a separate list).
Define this in a separate module to disable the multi_interp_extensions_check:
def disable_multi_interp_extensions_check():
import _imp
_imp._override_multi_interp_extensions_check(-1)
try:
from concurrent.futures.interpreter import InterpreterPoolExecutor # 3.14+
except ModuleNotFoundError:
from interpreters_backport.concurrent.futures.interpreter import InterpreterPoolExecutor # Backport / https://pypi.org/project/interpreters-pep-734/
from mycythonmodule import fib
from mydisablemodule import disable_multi_interp_extensions_check
with InterpreterPoolExecutor(max_workers=5, initializer = disable_multi_interp_extensions_check) as executor:
ipe_r=executor.map(fib, range(100))
print(list(ipe_r))
def fib(n: int) -> int:
a, b = 0, 1
while b < n:
a, b = b, a + b
return a
import _interpreters
SOMEGLOBAL = []
def uses_global(x) -> tuple:
SOMEGLOBAL.append(x)
return _interpreters.get_current(), len(SOMEGLOBAL) # returns the subinterpreter id
Thanks for testing - that's good to know. It also bodes well for when we do manage to make it work "properly" since much of it will remain the same.
For what it's worth I think it'd have failed if you'd made SOMEGLOBAL
a cdef
variable.
Yah, indeed. The per-subinterpreter consistency goes away with cdefs.
cdef list
cdef list SOMEGLOBAL5 = []
def cdef_list_mod(x):
SOMEGLOBAL5.append(x)
print(_interpreters.get_current(), SOMEGLOBAL5)
Inconsistent result w a cdef list: ** Inconsistent as in: subinterpreters see side effects from other subinterpreters
(2, 5) [1]
(2, 5) [1, 2]
(2, 5) [3]
(2, 5) [3, 4]
(2, 5) [3, 4, 5]
(2, 5) [3, 4, 5, 6]
(2, 5) [3, 4, 5, 6, 7]
(2, 5) [3, 4, 5, 6, 7, 9]
(10, 5) [10]
(3, 5) [12]
(6, 5) [12, 13]
(3, 5) [12, 13, 19]
(5, 5) [8]
(9, 5) [14]
(4, 5) [15]
(7, 5) [16]
(8, 5) [17]
Sane results w a python list (no cdef): ** What's "sane" is that each subinterpreter has a consistent sequence of values... similar to running the same code in processes / a ProcessPoolExecutor.
(1, 5) [1]
(1, 5) [1, 5]
(2, 5) [0]
(1, 5) [1, 5, 6]
(2, 5) [0, 8]
(4, 5) [2]
(2, 5) [0, 8, 10]
(5, 5) [3]
(1, 5) [1, 5, 6, 9]
(4, 5) [2, 11]
(1, 5) [1, 5, 6, 9, 14]
(5, 5) [3, 13]
(2, 5) [0, 8, 10, 12]
(4, 5) [2, 11, 15]
(1, 5) [1, 5, 6, 9, 14, 16]
(5, 5) [3, 13, 17]
(2, 5) [0, 8, 10, 12, 18]
(4, 5) [2, 11, 15, 19]
(3, 5) [4]
(6, 5) [7]
cdef int
cdef int SOMEGLOBAL4 = 0
def uses_global_cdef(x):
global SOMEGLOBAL4
SOMEGLOBAL4+=1
print(_interpreters.get_current(), SOMEGLOBAL4)
Results using r = executor.map(uses_global_cdef, range(20))
(the first tuple is the subinterpreter id):
(3, 5) 1
(3, 5) 2
(3, 5) 3
(3, 5) 4
(2, 5) 1
(1, 5) 1
(3, 5) 2
(2, 5) 3
(1, 5) 4
(3, 5) 5
(2, 5) 6
(1, 5) 7
(3, 5) 8
(2, 5) 9
(1, 5) 10
(3, 5) 11
(9, 5) 1
(4, 5) 1
(7, 5) 1
(5, 5) 2
Is your feature request related to a problem? Please describe.
InterpreterPoolExecutor's are to be introduced in 3.14, and backported to 3.13 https://github.com/python/cpython/pull/124548, backport. Cython does not support subinterpreters, thus do not work with an InterpreterPoolExecutor.
Importing in a subinterpreter yields: "ImportError: module Cython.Utils does not support loading in subinterpreters".
It looks like a lot of work was already done in the past to prepare for subinterpreters, by implementing CYTHON_USE_MODULE_STATE. Is this work ongoing? Is there a roadmap to finalize support for subinterpreters?
This impacts downstream packages, such as https://github.com/apache/arrow/issues/42151#issuecomment-2189528499
In my code, I would like to use cython functions inside an InterpreterPoolExecutor.
The following code will raise: "ImportError: module Cython.Utils does not support loading in subinterpreters"
** This example was using 3.13 with backported version: https://pypi.org/project/interpreters-pep-734/
Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Additional context
No response