Closed LukaJuricic closed 1 month ago
This change entirely disables the mechanism by which colcon tells setuptools where to install the package. Without this, colcon will try to install to the interpreter default location, which is completely incorrect. Even when using a virtual environment as you're describing, the installation location will be incorrect. In some simple tests on my machine, this doesn't even solve the problem you're describing, and builds still aren't able to find packages provided by the venv.
Depending on how the virtual environment was created, there is another factor which is likely contributing to the issue you're facing.
Colcon always uses the same Python interpreter to build/install Python packages as the one which was used to invoke colcon itself. If you're using the colcon
executable, it will therefore use the interpreter which was used when colcon-core was installed. This means that regardless of your virtual environment or what interpreter python
or python3
are resolved to, your packages will probably not be built/installed using the interpreter you want.
This problem is a lot worse if your virtual environment uses a different Python ABI version from the one colcon is using. Even if you were able to force colcon to find the virtual environment's packages, they may not be usable by colcon's interpreter (as would be the case for compiled Python modules like the ones in platlib
).
A naive suggestion might be to simply make colcon always use the python3
executable and not necessarily the one it was invoked with. Colcon itself loads setuptools to get information about the package prior to building it, so using a different interpreter (and likely setuptools
version) for identification and metadata extraction from the one used to build the package sounds like a recipe for bugs. Additionally, not all systems use python3
as the "default" Python interpreter, namely Windows.
I don't think there's a perfect "solution" to this problem, but I can offer some workarounds that may work for you.
colcon
executable is the one you want.colcon
executable that implicitly uses a different interpreter. While this is the cleanest solution, it does require that the venv interpreter can see all of the colcon packages and their dependencies, so --system-site-packages
may be necessary.$ which python3
/usr/bin/python3
$ python3 -m venv use_this --system-site-packages
$ . use_this/bin/activate
(use_this) $ which python3
~/use_this/bin/python3
(use_this) $ python3 -m colcon build ...
Scott, thank for having a look at this!
I'm a bit puzzled at your results, that's not at all what I get. I tried to:
docker run -it ros:rolling bash
pip install .
~/.local/bin/colcon build --packages-select demo_nodes_py
I see the patched sitecustomize in build/demo_nodes_py/prefix_override
, the modules in install/demo_nodes_py/lib/python3.10/site-packages/demo_nodes_py
and the scripts in install/lib/demo_nodes_py
.
As for ament_virtualenv
, it works by calling colcon build
in a shell without any virtual environment activated.
Instead, the package to be installed must use a custom install command (provided by ament_virtualenv
) in the setup.py
, which adds a post-install script to the default install command. Only at this stage the virtual environment is created, the dependencies are installed in it and the shebangs of the scripts in install/lib/<pkg>
are patched with the virtual environment python interpreter.
This means that the package is installed as a normal package, with the system python and the usual ROS install locations, while the only stage when the virtual environment python interpreter is used is to install the dependencies, through subprocess.call(["<pkg_install_dir>/venv/bin/python", "-m", "pip", "install", ...])
. And this is exactly the stage where it fails without the proposed patch.
I wasn't able to get ament_virtualenv
to work, but I was able to reproduce the problem by attempting to use the venv it created while the sitecustomize was on PYTHONPATH
. I see the problem now.
The breakage from the original patch that I was describing was verified by CI, where all of the python build tests were failing to find the package installed to the anticipated location (presumably colcon installed the package under /usr
or /usr/local
).
Since we know what the prefix is that we want to override, and we know that any subprocesses that activates a venv would change that prefix, we can just make the conditional compare to that instead.
Please verify that this change still resolves the issue for ament_virtualenv
.
Awesome, thanks for taking a second look!
I checked with our humble-ported version or ament_virtualenv
, and it works.
This reminded me that we still have to publish it :sweat_smile: . If anybody stumbles on this in the future, it will be most likely here.
Currently a sitecustomize python module is loaded during
ament_python
package installs. This impedes the use of a virtual environment during the install process, as is the case while using ament_virtualenv. Due to sitecustomize module path configuration, mimicking the venv one, even if a script is executed using the venv python binary,pyvenv.cfg
is not found and the path configuration will not be correct.With this fix, the sitecustomize path configuration will appy only while not using a virtual environment. In particular in the case of
ament_virtualenv
, the pip dependencies will be installed in the venv lib, while the modules of the package being built will still end up in the package install lib.NB:
ament_virtualenv
is currently being ported to humble, I did not test it in the original targeted distribution (electron).