NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.65k stars 13.8k forks source link

python.withPackages doesn't work when embedding python in C #108434

Open nbren12 opened 3 years ago

nbren12 commented 3 years ago

Describe the bug

python3.withPackages does not correctly set the python path when embedding a python interpreter in a C program. The created site-packaged directory is not added to python's sys.path. This issue is specific to embedding python---sys.path correctly includes the site-packages directory when run directly from the python executable: python -c 'import sys; print(sys.path).

To Reproduce

I built a example based on python's embedding docs.

Clone this gist and run nix-build.

Expected behavior

The check proceeds correctly and succeeds at import cffi.

Notify maintainers

@FRidh @bennofs @adisbladis

Metadata

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
nbren12 commented 3 years ago

This is the output of nix-build on my machine:

$ nix-build
these derivations will be built:
  /nix/store/hkjvrjcjp69nd123r5zgpllwrbllz0b9-example.drv
building '/nix/store/hkjvrjcjp69nd123r5zgpllwrbllz0b9-example.drv'...
unpacking sources
unpacking source archive /nix/store/a4ai863jd39yh0x4qljwgf7c98zmihcl-nix-embedding-failure-example
source root is nix-embedding-failure-example
patching sources
configuring
no configure script, doing nothing
building
build flags: SHELL=/nix/store/xh9cijyqbznza3v5wb5rl6r7r11xd4f9-bash-4.4-p23/bin/bash
clang -lpython3.8 -I/nix/store/a7d6nz13j5rjn74kimvdfqziggxd3lp8-python3-3.8.5-env/include/python3.8 main.c
running tests
check flags: SHELL=/nix/store/xh9cijyqbznza3v5wb5rl6r7r11xd4f9-bash-4.4-p23/bin/bash VERBOSE=y check
./a.out
Today is Mon Jan  4 23:27:02 2021
Importing cffi
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cffi'
make: *** [Makefile:5: check] Error 255
builder for '/nix/store/hkjvrjcjp69nd123r5zgpllwrbllz0b9-example.drv' failed with exit code 2
error: build of '/nix/store/hkjvrjcjp69nd123r5zgpllwrbllz0b9-example.drv' failed
stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

fluffynukeit commented 1 year ago

Hi, investigated this problem quite intently in the last day or so. It has been a thorn in my side as I have packaged multiple derivations for personal use that use libpython under the hood, and have always had to resort to dumping the entirety of the module paths into PYTHONPATH, sometimes resorting to invoking python, querying sys.path to build the PYTHONPATH. Here are some of my notes for nixpkgs revision a59dda9.

The general problem is that python applications invoked using the python interpreter have logic that automatically identifies an appropriate value for PYTHONHOME, but this logic apparently is skipped when embedding python. Without a correct PYTHONHOME, the rest of the nix python infrastructure falls apart.

Consult: https://github.com/NixOS/nixpkgs/tree/master/pkgs/development/interpreters/python

General note: the situation overall is complicated by ambiguity between what the normal initialization sequence is for the standard cpython interpreter compared to the libpython library.

The custom python environments in nix are achieved with the combination of python.buildEnv (called by withPackages under the hood) and a special sitecustomize.py. For a program with embedded python to use a nix python environment, that program must be exposed to both of these things.

To expose a program to python.buildEnv/withPackages, you must include that program in the list of packages or extraLibs in the case of python.buildEnv. Because these functions will automatically filter out derivations that are not created with "buildPythonPackage" and a matching python version, you have to trick them into thinking the program is a python module using toPythonModule. This will trigger python.buildEnv to identify and wrap the program (which must be in the bin directory of its store location) with the appropriate environment variables that are intended to be used by sitecustomize.py.

Python's sys.path initialization sequence indicates "home" is automatically identified when invoking the python interpreter executable. With home identified, appropriate default python search paths are added, one of which will include site-packages, which in nix contains the sitecustomize.py file needed for configuring python to use the nix python environment. But, in the case of the embedded python library libpython, the automatic identification of home is skipped or wrong or messed up by environment symlinks. If home isn't right, then site-packages isn't right, and nix's sitecustomize.py file is not on the python search path, so it can't be imported as part of the standard python sys.path initialization sequence, and the paths cannot be nixified.

So a workaround looks like the below. It's not the same as @nbren12 's demonstration case because the final executable has to be wrapped. In this case, my situation was more complicated because I have to use postgresql.withPackages as well as python.withPackages.

# Build multicorn python package.  This includes multicorn.so, which uses libpython.so.
multicorn = python.pkgs.buildPythonPackage { <removed for brevity >}

# Build a postgresql environment using multicorn.  This derivation has multicorn's python module nix ID flags removed
# because they are not propagated.
postgres-env = postgresql.withPackages(ps: [multicorn]);

# Make a python environment that includes a derivation not built with buildPythonPackage.  Need to use
# toPythonModule to trick withPackages into using it.
postgres-with-python-basic = python.withPackages(ps: [(ps.toPythonModule postgres-env)]); 

# Override makeWrapperArgs to explicitly set PYTHONHOME.  (I was unable to figure out the right syntax
# for doing this in one step without the intermediate derivation.)
postgres-with-python = postgres-with-python-basic.override {
        makeWrapperArgs = ["--set PYTHONHOME $out"];  # I think this could be added to python.buildEnv without side effects.
};

One thing I think is advantageous about this approach is that the embedded python environment used by the program is not part of the program's buildInputs, so changing the embedded python environment does not trigger a rebuild of the application.