PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12k stars 741 forks source link

"undefined symbol: PyTuple_Type" when importing `subprocess` inside evcxr notebook. #2000

Closed jonasbb closed 2 years ago

jonasbb commented 2 years ago

Bug Description

Importing the subprocess module fails with a "undefined symbol: PyTuple_Type" error. This only happens when running inside the evcxr REPL or Jupyter kernel. This does not happen when executing the code via cargo run.

Not all python modules are problematic. For example, the sys module can be imported without any problem. Evaluation expressions like 1 + 2 + 3 also works when run via evcxr.

Steps to Reproduce

  1. Install evcxr or evcxr Jupyter kernel version 0.12.0. You can also use an online version of the Jupyter kernel via mybinder.
  2. Specify pyo3 as dependency by executing :dep pyo3 inside the repl/notebook.
  3. Running this snippet in the repl/notebook fails with belows backtrace:

      use pyo3::prelude::*;
    
      fn main() -> PyResult<()> {
          pyo3::prepare_freethreaded_python();
          Python::with_gil(|py| {
              py.import("subprocess").unwrap_err().print(py);
              Ok(())
          })
      }
      main()

Backtrace

Traceback (most recent call last):
  File "/usr/lib64/python3.10/subprocess.py", line 69, in <module>
    import msvcrt
ModuleNotFoundError: No module named 'msvcrt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.10/subprocess.py", line 74, in <module>
    import _posixsubprocess
ImportError: /usr/lib64/python3.10/lib-dynload/_posixsubprocess.cpython-310-x86_64-linux-gnu.so: undefined symbol: PyTuple_Type

Your operating system and version

Fedora 35

Your Python version (python --version)

Python 3.10.0

Your Rust version (rustc --version)

rustc 1.56.0 (09c42c458 2021-10-18)

Your PyO3 version

0.15.0

How did you install python? Did you use a virtualenv?

dnf, pre-installed Python version in Fedora

Additional Info

Output of PYO3_PRINT_CONFIG=1

  -- PYO3_PRINT_CONFIG=1 is set, printing configuration and halting compile --
  implementation=CPython
  version=3.10
  shared=true
  abi3=false
  lib_name=python3.10
  lib_dir=/usr/lib64
  executable=/usr/bin/python
  pointer_width=64
  build_flags=WITH_THREAD
  suppress_build_script_link_lines=false

This is the beginning of the /usr/lib64/python3.10/subprocess.py file on my system. Line 69 is the import msvcrt. It is inside a try-except block. The PyTuple_Type error appears on line 74 when it tries to import _posixsubprocess.

# subprocess - Subprocesses with accessible I/O streams
#
# For more information about this module, see PEP 324.
#
# Copyright (c) 2003-2005 by Peter Astrand <astrand@lysator.liu.se>
#
# Licensed to PSF under a Contributor Agreement.

r"""Subprocesses with accessible I/O streams

This module allows you to spawn processes, connect to their
input/output/error pipes, and obtain their return codes.

For a complete description of this module see the Python documentation.

Main API
========
run(...): Runs a command, waits for it to complete, then returns a
          CompletedProcess instance.
Popen(...): A class for flexibly executing a command in a new process

Constants
---------
DEVNULL: Special value that indicates that os.devnull should be used
PIPE:    Special value that indicates a pipe should be created
STDOUT:  Special value that indicates that stderr should go to stdout

Older API
=========
call(...): Runs a command, waits for it to complete, then returns
    the return code.
check_call(...): Same as call() but raises CalledProcessError()
    if return code is not 0
check_output(...): Same as check_call() but returns the contents of
    stdout instead of a return code
getoutput(...): Runs a command in the shell, waits for it to complete,
    then returns the output
getstatusoutput(...): Runs a command in the shell, waits for it to complete,
    then returns a (exitcode, output) tuple
"""

import builtins
import errno
import io
import os
import time
import signal
import sys
import threading
import warnings
import contextlib
from time import monotonic as _time
import types

try:
    import fcntl
except ImportError:
    fcntl = None

__all__ = ["Popen", "PIPE", "STDOUT", "call", "check_call", "getstatusoutput",
           "getoutput", "check_output", "run", "CalledProcessError", "DEVNULL",
           "SubprocessError", "TimeoutExpired", "CompletedProcess"]
           # NOTE: We intentionally exclude list2cmdline as it is
           # considered an internal implementation detail.  issue10838.

try:
    import msvcrt
    import _winapi
    _mswindows = True
except ModuleNotFoundError:
    _mswindows = False
    import _posixsubprocess
    import select
    import selectors
else:
    from _winapi import (CREATE_NEW_CONSOLE, CREATE_NEW_PROCESS_GROUP,
                         STD_INPUT_HANDLE, STD_OUTPUT_HANDLE,
                         STD_ERROR_HANDLE, SW_HIDE,
                         STARTF_USESTDHANDLES, STARTF_USESHOWWINDOW,
                         ABOVE_NORMAL_PRIORITY_CLASS, BELOW_NORMAL_PRIORITY_CLASS,
                         HIGH_PRIORITY_CLASS, IDLE_PRIORITY_CLASS,
                         NORMAL_PRIORITY_CLASS, REALTIME_PRIORITY_CLASS,
                         CREATE_NO_WINDOW, DETACHED_PROCESS,
                         CREATE_DEFAULT_ERROR_MODE, CREATE_BREAKAWAY_FROM_JOB)

    __all__.extend(["CREATE_NEW_CONSOLE", "CREATE_NEW_PROCESS_GROUP",
                    "STD_INPUT_HANDLE", "STD_OUTPUT_HANDLE",
                    "STD_ERROR_HANDLE", "SW_HIDE",
                    "STARTF_USESTDHANDLES", "STARTF_USESHOWWINDOW",
                    "STARTUPINFO",
                    "ABOVE_NORMAL_PRIORITY_CLASS", "BELOW_NORMAL_PRIORITY_CLASS",
                    "HIGH_PRIORITY_CLASS", "IDLE_PRIORITY_CLASS",
                    "NORMAL_PRIORITY_CLASS", "REALTIME_PRIORITY_CLASS",
                    "CREATE_NO_WINDOW", "DETACHED_PROCESS",
                    "CREATE_DEFAULT_ERROR_MODE", "CREATE_BREAKAWAY_FROM_JOB"])
davidhewitt commented 2 years ago

Hmmm, I'm not familiar at all with evcxr but this sounds like an environment issue of some form? The import msvcrt line makes me suspect that Python is running as if it thinks it's Windows? But you state you're running Linux.

Are you able to run evxcr with PYO3_PRINT_CONFIG env var set to see what configuration we're picking up?

jonasbb commented 2 years ago

Output of PYO3_PRINT_CONFIG=1

  -- PYO3_PRINT_CONFIG=1 is set, printing configuration and halting compile --
  implementation=CPython
  version=3.10
  shared=true
  abi3=false
  lib_name=python3.10
  lib_dir=/usr/lib64
  executable=/usr/bin/python
  pointer_width=64
  build_flags=WITH_THREAD
  suppress_build_script_link_lines=false

This is the beginning of the /usr/lib64/python3.10/subprocess.py file on my system. Line 69 is the import msvcrt. It is inside a try-except block. The PyTuple_Type error appears on line 74 when it tries to import _posixsubprocess.

# subprocess - Subprocesses with accessible I/O streams
#
# For more information about this module, see PEP 324.
#
# Copyright (c) 2003-2005 by Peter Astrand <astrand@lysator.liu.se>
#
# Licensed to PSF under a Contributor Agreement.

r"""Subprocesses with accessible I/O streams

This module allows you to spawn processes, connect to their
input/output/error pipes, and obtain their return codes.

For a complete description of this module see the Python documentation.

Main API
========
run(...): Runs a command, waits for it to complete, then returns a
          CompletedProcess instance.
Popen(...): A class for flexibly executing a command in a new process

Constants
---------
DEVNULL: Special value that indicates that os.devnull should be used
PIPE:    Special value that indicates a pipe should be created
STDOUT:  Special value that indicates that stderr should go to stdout

Older API
=========
call(...): Runs a command, waits for it to complete, then returns
    the return code.
check_call(...): Same as call() but raises CalledProcessError()
    if return code is not 0
check_output(...): Same as check_call() but returns the contents of
    stdout instead of a return code
getoutput(...): Runs a command in the shell, waits for it to complete,
    then returns the output
getstatusoutput(...): Runs a command in the shell, waits for it to complete,
    then returns a (exitcode, output) tuple
"""

import builtins
import errno
import io
import os
import time
import signal
import sys
import threading
import warnings
import contextlib
from time import monotonic as _time
import types

try:
    import fcntl
except ImportError:
    fcntl = None

__all__ = ["Popen", "PIPE", "STDOUT", "call", "check_call", "getstatusoutput",
           "getoutput", "check_output", "run", "CalledProcessError", "DEVNULL",
           "SubprocessError", "TimeoutExpired", "CompletedProcess"]
           # NOTE: We intentionally exclude list2cmdline as it is
           # considered an internal implementation detail.  issue10838.

try:
    import msvcrt
    import _winapi
    _mswindows = True
except ModuleNotFoundError:
    _mswindows = False
    import _posixsubprocess
    import select
    import selectors
else:
    from _winapi import (CREATE_NEW_CONSOLE, CREATE_NEW_PROCESS_GROUP,
                         STD_INPUT_HANDLE, STD_OUTPUT_HANDLE,
                         STD_ERROR_HANDLE, SW_HIDE,
                         STARTF_USESTDHANDLES, STARTF_USESHOWWINDOW,
                         ABOVE_NORMAL_PRIORITY_CLASS, BELOW_NORMAL_PRIORITY_CLASS,
                         HIGH_PRIORITY_CLASS, IDLE_PRIORITY_CLASS,
                         NORMAL_PRIORITY_CLASS, REALTIME_PRIORITY_CLASS,
                         CREATE_NO_WINDOW, DETACHED_PROCESS,
                         CREATE_DEFAULT_ERROR_MODE, CREATE_BREAKAWAY_FROM_JOB)

    __all__.extend(["CREATE_NEW_CONSOLE", "CREATE_NEW_PROCESS_GROUP",
                    "STD_INPUT_HANDLE", "STD_OUTPUT_HANDLE",
                    "STD_ERROR_HANDLE", "SW_HIDE",
                    "STARTF_USESTDHANDLES", "STARTF_USESHOWWINDOW",
                    "STARTUPINFO",
                    "ABOVE_NORMAL_PRIORITY_CLASS", "BELOW_NORMAL_PRIORITY_CLASS",
                    "HIGH_PRIORITY_CLASS", "IDLE_PRIORITY_CLASS",
                    "NORMAL_PRIORITY_CLASS", "REALTIME_PRIORITY_CLASS",
                    "CREATE_NO_WINDOW", "DETACHED_PROCESS",
                    "CREATE_DEFAULT_ERROR_MODE", "CREATE_BREAKAWAY_FROM_JOB"])

I am not familiar with the evcxr internals myself. I do know that it does some stuff with dynamic linking and loading. That could be a reason why the _posixsubprocess.cpython-310-x86_64-linux-gnu.so complains about missing symbols.

davidhewitt commented 2 years ago

Thanks, looks like PyO3 has configured correctly. TBH I'm not familiar with evcxr at all, it would probably be wise to open an issue on their repo too as they might have a better idea what's going on. I'm happy to offer PyO3 knowledge to help figure this out but I don't think I'll have time to investigate this myself any time soon.

davidhewitt commented 2 years ago

I do know that it does some stuff with dynamic linking and loading. That could be a reason why the _posixsubprocess.cpython-310-x86_64-linux-gnu.so complains about missing symbols.

Yes, that's my hunch too.

jonasbb commented 2 years ago

Thanks for the help. I opened https://github.com/google/evcxr/issues/201 to see if that can/should work.

sigmaSd commented 2 years ago

I can reproduce the crash with a normal project: code:

fn main() -> () {
    use pyo3::prelude::*;
    pyo3::prepare_freethreaded_python();
    Python::with_gil(|py| {
        let sub = py.import("subprocess").unwrap();
    });
}

panic:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ImportError'>, value: ImportError('/home/gitpod/.pyenv/versions/3.8.12/lib/python3.8/lib-dynload/_posixsubprocess.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyTuple_Type'), traceback: Some(<traceback object at 0x7fadc671e880>) }', src/main.rs:6:42

meta: rustc: 1.56.1 (59eed8a2a 2021-11-01) python: Python 3.8.12 pyo3: "0.15.1" os: Linux ws-485d7766-07df-4743-911f-9cad7e88ad41 5.4.0-1051-gke #54-Ubuntu SMP Thu Aug 5 18:52:13 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

This doesn't happen with import("os") or import("sys") for example

jonasbb commented 2 years ago

I manage to find a quite ugly workaround for now.

std::env::set_var("PYTHONUNBUFFERED", "1");
let lib = unsafe {
    libloading::os::unix::Library::open(Some("/usr/lib64/libpython3.10.so"), libloading::os::unix::RTLD_NOW | libloading::os::unix::RTLD_GLOBAL)?
};

The call to libloading loads the python dynamic library with libpython3.10.so thus ensuring that the symbols will be present, thus preventing the "undefined symbol" error. Setting PYTHONUNBUFFERED is necessary to make prints behave interactively. The Python output was buffered, probably because the communication with evcxr is via pipes. But in an interactive notebook it should be unbuffered or at most line buffered.


The problem is that _posixsubprocess.cpython-310-x86_64-linux-gnu.so (incorrectly?) does not have a dependency on libpython3.10.so specified. That means that when it is loaded with dlopen the symbols defined in libpython3.10 will not be loaded, but must already be present in the loading process. dlopen and libloading (used by evcxr) both default to RTLD_LOCAL when loading new libraries. RTLD_LOCAL (source):

Symbols defined in this library are not made available to resolve references in subsequently loaded libraries. The snippet above just ensures that the Python symbols will be visible to all later libraries.

It has the downside of hard-coding the Python library again, even though PyO3 already linked against it. The path needs to be updated every time the system Python is updated. It also requires system specific APIs.


Even with those symbols available the interactive use is not quite painfree since the output is buffered, which needs to be changed manually. I assume pyo3 could offer a feature flag like auto-initialize but for evcxr which would perform these adjustments before executing any Python code. Of course, assuming this is of interest to PyO3.

davidhewitt commented 2 years ago

The problem is that _posixsubprocess.cpython-310-x86_64-linux-gnu.so (incorrectly?) does not have a dependency on libpython3.10.so specified.

Ah, that's actually by design. (See https://bugs.python.org/issue21536.) It's so that extensions built by dynamically-linked Python interpreters can be loaded by statically-linked Python interpreters.

The situation you find yourself in here is actually somewhat similar to #700, where dlopen with RTLD_GLOBAL was again the solution.

I assume pyo3 could offer a feature flag like auto-initialize but for evcxr which would perform these adjustments before executing any Python code. Of course, assuming this is of interest to PyO3.

My personal feeling is that evcxr is quite niche at the moment, so I wouldn't particualy want to maintain a feature in PyO3 which exactly solves this use case (which would include writing careful testing, for example).

That isn't to say that we can't add more reusable functionality which could help in your use-case. For example, we could add a macro which expands to the location of the python shared library, so that you don't need to hard-code it.

E.g. it could be based off the PYO3_CONFIG build mechanism - maybe we could have pyo3_config_var!("lib_path"). Or maybe just pyo3_lib_path!(). I'm open to hearing ideas on what's better.

jonasbb commented 2 years ago

Thanks for all the help. So there is nothing easy to be done for either evcxr or pyo3. I agree that evcxr doesn't feel too popular right now. It would be nice to have access to the library path from PYO3_CONFIG (or all values) but that is not a problem for me.

E.g. it could be based off the PYO3_CONFIG build mechanism - maybe we could have pyo3_config_var!("lib_path"). Or maybe just pyo3_lib_path!(). I'm open to hearing ideas on what's better.

An alternative could be a module filled with many const values, similar to what built does.