apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.64k stars 3.56k forks source link

Support for SubInterpreters and InterpreterPoolExecutors #44511

Open paultiq opened 1 month ago

paultiq commented 1 month ago

Describe the enhancement requested

InterpreterPoolExecutor's are to be introduced in 3.14, and backported to 3.13 https://github.com/python/cpython/pull/124548, backport:

At present, pyarrow fails with: "ImportError: module pyarrow.lib does not support loading in subinterpreters"

See following example: TPE and PPE work, IPE does not.

from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
from interpreters_backport.concurrent.futures.interpreter import InterpreterPoolExecutor

def try_tpe():
    with ThreadPoolExecutor() as executor:
        f = executor.submit(exec, "import pyarrow as pa")
        f.result()

def try_ppe():
    with ProcessPoolExecutor() as executor:
        f = executor.submit(exec, "import pyarrow as pa")
        f.result()

def try_ipe():
    with InterpreterPoolExecutor() as executor:
        f = executor.submit(exec, "import pyarrow as pa")
        f.result()

if __name__ == "__main__":
    try_tpe()
    try_ppe()
    try_ipe()

Result:

Traceback (most recent call last):
  File "c:\gitother\iscratch\.venv\Lib\site-packages\interpreters_backport\concurrent\futures\interpreter.py", line 181, in run
    self._exec(script)
    ~~~~~~~~~~^^^^^^^^
  File "c:\gitother\iscratch\.venv\Lib\site-packages\interpreters_backport\concurrent\futures\interpreter.py", line 127, in _exec
    raise ExecutionFailed(excinfo)
interpreters_backport.concurrent.futures.interpreter.ExecutionFailed: ImportError: module pyarrow.lib does not support loading in subinterpreters

Uncaught in the interpreter:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\gitother\iscratch\.venv\Lib\site-packages\interpreters_backport\concurrent\futures\interpreter.py", line 111, in _call_pickled
    cls._call(fn, args, kwargs, resultsid)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\gitother\iscratch\.venv\Lib\site-packages\interpreters_backport\concurrent\futures\interpreter.py", line 100, in _call
    res = func(*args or (), **kwargs or {})
  File "<string>", line 1, in <module>
  File "c:\gitother\iscratch\.venv\Lib\site-packages\pyarrow\__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: module pyarrow.lib does not support loading in subinterpreters

Tested on pyarrow 18.0.0rc0 on 3.13 amd64 using the https://pypi.org/project/interpreters-pep-734/ backport which backports the InterpreterPoolExecutor to 3.13.

Component(s)

Python

paultiq commented 1 month ago

This may be a dupe of https://github.com/apache/arrow/issues/42151, so just linking it for clarity.

I guess the question is just whether there's any hope to get this on the roadmap or whether it's in a long hold for Cython.

I opened https://github.com/cython/cython/issues/6445 on the Cython side asking more or less the same Q