Closed Crivella closed 1 month ago
@Flamefire You could probably give the most informed review of this. I would be inclined to accept it.
I don't see any problem with this in general besides increasing the number of (potentially superflous) parameters we pass to configure.
However:
The GROMACS change was very focused on that specific software and restricted to "faulty" versions. Here we'd not only need to pass Python3_FIND_VIRTUALENV
but also Python_FIND_VIRTUALENV
and Python_FIND_VIRTUALENV
to cover all cases.
I'd prefer to pass the Python binary path explicitly as done in this PR This not only avoids it finding a virtualenv Python ("fixed" here) but also covers many other cases we might not found yet.
So that is more general without unnecessarily increasing verbosity for cases where we don't need it.
@Crivella What do you think about that? Can you try #3282 with your https://github.com/easybuilders/easybuild-easyblocks/pull/3373 not setting the Python3_FIND_VIRTUALENV
to see if that works?
@Flamefire Started the test it will take a couple hours to finish, but since stage 1 passed i don't think there would be a problem.
My concern was the order with which the hints where evaluated since #3283 was necessary, but Python_ROOT_DIR
seems to take precedence above all
Just ran a minimal test with
cmake_minimum_required(VERSION 3.15.0)
project(MyProject)
find_package(Python3)
and
#!/usr/bin/env bash
module purge
module load CMake/3.29.3-GCCcore-13.3.0
module load Python/3.12.3-GCCcore-13.3.0
module list
source $HOME/.virtualenvs/test/bin/activate
which python
rm -rf build
cmake -B build -S . \
-DPython3_ROOT_DIR=/usr/bin \
-DPython3_FIND_VIRTUALENV=ONLY
which results in
...
-- Found Python3: /usr/bin/python3.12 (found version "3.12.5") found components: Interpreter
...
Was there some weirder reason that did not make GROMACS work with #3282 ?
I'll try to compile my findings on the order of paths considered.
The documentation states:
`Python3_FIND_VIRTUALENV` variable can be set to one of the following:
FIRST: The virtual environment is used before any other standard paths to look-up for the interpreter. This is the default.
ONLY: Only the virtual environment is used to look-up for the interpreter.
STANDARD: The virtual environment is not used to look-up for the interpreter but environment variable PATH is always considered. In this case, variable Python3_FIND_REGISTRY (Windows) or CMAKE_FIND_FRAMEWORK (macOS) can be set with value LAST or NEVER to select preferably the interpreter from the virtual environment.
The relevant file is https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindPython/Support.cmake which gets "called" by "FindPython*" with ${_PYTHON_PREFIX}
set to e.g. "Python3"
We can ignore the "STANDARD" for now because when *_FIND_VIRTUALENV
is not set to FIRST or ONLY then $CMAKE_PREFIX_PATH
is used which we set in the module.
Which leads to:
<PackageName>_ROOT
CMake variable, where <PackageName>
is the case-preserved package name.<PACKAGENAME>_ROOT
CMake variable, where <PACKAGENAME>
is the upper-cased package name.${_PYTHON_PREFIX}_ROOT_DIR
VIRTUAL_ENV
& CONDA_PREFIX
I excluded all of the ones mentioned in the docs that are excluded by the explicit NO_*
options. So we should be safe. However there is "[The first 2 steps] can be skipped by setting the CMAKE_FIND_USE_PACKAGE_ROOT_PATH
variable to FALSE
.
So maybe we need to also set the *_ROOT_DIR
env variables to be safe to not get a virtualenv.
By looking at the CMake source you linked (never realized it was so messy XD) and experimenting a bit more, I am wondering if we should just cut to the source and set Python_EXECUTABLE
and/or Python[2/3]_EXECUTABLE
to avoid going to the HINTS
part of the code.
It seems to also find the correct components when specifying COMPONENTS Development
test.sh
#!/usr/bin/env bash
module purge
module load CMake/3.29.3-GCCcore-12.3.0
module load Python/3.11.3-GCCcore-12.3.0
module load SciPy-bundle/2023.07-gfbf-2023a
module list
source $HOME/.virtualenvs/test/bin/activate
which python
_PYTHON="/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin/python"
rm -rf build
cmake -B build -S . \
-DPython_FIND_VIRTUALENV=ONLY \
-DPython_ROOT=/usr/bin \
-DPython_EXECUTABLE=${_PYTHON} \
CMakeLists.txt
cmake_minimum_required(VERSION 3.15.0)
project(MyProject)
find_package(
Python 3.11.0...3.11.10
COMPONENTS Interpreter Development NumPy
)
CMakeCache.txt
...
//Compiler reason failure
_Python_Compiler_REASON_FAILURE:INTERNAL=
_Python_DEVELOPMENT_EMBED_SIGNATURE:INTERNAL=6e4a7d11dfc3083ba9e17dd9c0c9ffd5
_Python_DEVELOPMENT_MODULE_SIGNATURE:INTERNAL=7f9cffd5f241febb4fdd6ed780db4712
//Development reason failure
_Python_Development_REASON_FAILURE:INTERNAL=
_Python_EXECUTABLE:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin/python
//Path to a file.
_Python_INCLUDE_DIR:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11
//Python Properties
_Python_INTERPRETER_PROPERTIES:INTERNAL=Python;3;11;3;64;;cpython-311-x86_64-linux-gnu;abi3;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages
_Python_INTERPRETER_SIGNATURE:INTERNAL=a6927044a8e489828e7111e599980346
//Path to a library.
_Python_LIBRARY_RELEASE:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so
_Python_NUMPY_SIGNATURE:INTERNAL=64759e84174ed96a40e217a0380274d0
//Path to a file.
_Python_NumPy_INCLUDE_DIR:INTERNAL=/home/crivella/.local/easybuild/software/SciPy-bundle/2023.07-gfbf-2023a/lib/python3.11/site-packages/numpy/core/include
...
The fact that _Python_Development_REASON_FAILURE
is still there is due to it never getting unset in the code, but _Python_LIBRARY_RELEASE:
pointing to the correct .so makes me thing stuff is working correctly
I am wondering if we should just cut to the source and set
Python_EXECUTABLE
and/orPython[2/3]_EXECUTABLE
Yes I guess that would also work although I don't know if there is some (old?) CMake user code out there that relies on Python_ROOT
. We'd need to check at least the history of the module in the CMake repo.
I also guess there is no trouble in using $EBROOT_PYTHON/bin/python
(resolved by the easyblock) as the param value
I've been browsing through the CMake source code previous versions.
I've also tested up to CMake v3.12 where FindPython
was introduced and Python_EXECUTABLE
works for all of them.
I think the logic was properly implemented in 3.16, using _${PYTHON_PREFIX}_EXECUTABLE
as the store variable for find_program
but even before that, since find_program
is being called directly on ${PYTHON_PREFIX}_EXECUTABLE
due to the behavior of find_program
If the program is found the result is stored in the variable and the search will not be repeated unless the variable is cleared. If nothing is found, the result will be -NOTFOUND.
This is still working since we are manually setting the result to what we want and ${PYTHON_PREFIX}_EXECUTABLE
is not being unset by the module.
I also tested that there is no problem in case the python version does not matches the requirements since the version is later validated even if we are forcing the find_program
result.
I've made a new commit with the proposed changes (https://github.com/easybuilders/easybuild-easyblocks/pull/3463/commits/47f21b974fda5c4b4404e09f3a677de0e0667b49).
I would argue that unless we want to analyze the CMakeLists.txt we have to set both Python_EXECUTABLE
and PYTHON_EXECUTABLE
in case some code is still using the deprecated find_package{PythonInterp ...)
(unfortunately they are case sensitive)
I guess there is also an argument to be made to actually set Python_EXECUTABLE
to a dump location so that if people are making an EC for a code that requires python they are forced to use one through EB modules instead of randomly picking up a system one for reproducibility?
What exactly have you tested? Did you test 3.12 and below or 3.12 and up? Because how I understand the change in 3.16 the *_EXECUTABLE isn't honored before.
Checked 3.12 and above since before there is only FindPythonInterp
which has not been changed
So unless the python cmd set is invalid it should be kept because find_program does nothing if the variable is already set.
Yes this is what i meant in my previous comment, even if it is not explicitly the behavior of FindPython
due to the behavior of find_program
as long as we set ${PYTHON_PREFIX}_EXECUTABLE
to a correct path there should be no problem (also concerning version checks since they happens after and will give the correct error if we are passing a non supported version of python)
Test report by @Flamefire
Build succeeded for 12 out of 15 (7 easyconfigs in total) i7021 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17 See https://gist.github.com/Flamefire/4db38a9b8c8fdb93627119d0443b789c for a full test report.
GROMACS 2023.x fails during the installation of the Python extension which uses CMake through pip
and then runs into the same problem. We do that extension installation (in the current way) only since 2023.x and 2024 uses scikit-build which initializes the CMakeCache with some Python variables.
I'm not sure how to solve this. The extension in this EasyConfig is a "regular" PythonPackage
but it supports an env variable CMAKE_ARGS
to pass arguments to CMake.
We could likely also set Python3_ROOT_DIR
as env variables which should also avoid it finding the virtualenv.
However: Any ideas how to pass either the arguments or the env variables to the extension? I guess the gromacs easyblock could set the env variables before installing the extensions but then it would be good to have some way to get the values from the cmake easyblock to avoid duplication
Have not worked much with extensions yet, but by looking at the code I think the easiest solution would be to modify the CMAKE_ARGS
to define either Python3_ROOT_DIR
or Python3_EXECUTABLE
.
That could be done either overriding install_extensions_sequential setting the variable before calling the original method eg
def install_extensions_sequential(self, *args, **kwargs):
"""Install the gmxapi extension"""
python_root = get_software_root('Python')
python_bin = os.path.join(python_root, 'bin', 'python')
env.setvar(
'CMAKE_ARGS',
"-Dgmxapi_ROOT=%s -DPython3_EXECUTABLE=%s -C %s/share/cmake/gromacs_mpi/gromacs-hints_mpi.cmake" %
(self.installdir, python_bin, self.installdir)
)
super(EB_GROMACS, self).install_extensions_sequential(*args, **kwargs)
or possibly defining an hook for the extension (https://github.com/easybuilders/easybuild-framework/blob/8ff6ba0d426b79a9b3da1248a7359d922b50596d/easybuild/framework/easyblock.py#L1916)
I am not sure if there is a way to modify the extension steps in some other way
However: Any ideas how to pass either the arguments or the env variables to the extension? I guess the gromacs easyblock could set the env variables before installing the extensions but then it would be good to have some way to get the values from the cmake easyblock to avoid duplication
I guess this would be a design choice. I can think of two way:
options
dictionary from the configure step to a separate function. This would allow a child easyblock to just call that function and then pick and use the options it needs (or atleast save the options after configure is called to allow for later reuse)get_cmakeopts_python
to get all cmake options related to pythonAlso I would say the 2 are not mutually exclusive and both could benefit also from implementing a scoped version of the options
dictionary where options are split into groups that make sense to recall together eg ('policies', 'generic/base', 'software_[python/...]', ...)
Have not worked much with extensions yet, but by looking at the code I think the easiest solution would be to modify the
CMAKE_ARGS
to define eitherPython3_ROOT_DIR
orPython3_EXECUTABLE
. That could be done either overriding install_extensions_sequential setting the variable before calling the original method eg
That specific variable is only for GROMACS: https://github.com/easybuilders/easybuild-easyconfigs/blob/ef56e11c4066828fa4af98b4709ba40728fc4cc6/easybuild/easyconfigs/g/GROMACS/GROMACS-2024.1-foss-2023b.eb#L78
Hence the idea for that specific easyblock to get required env vars from cmakemake (e.g. Python3_ROOT_DIR
) to set them before installing an extension. For that it might be easier, if we use those env vars in cmakemake too, instead of *_EXECUTABLE.
* Add a convenience function eg `get_cmakeopts_python` to get all cmake options related to python
Yes that is what I had in mind: get_cmake_env_vars_python
or so. Or even just get_cmake_env_vars
and it would detect which ones to set dependent on loaded modules, e.g. add the python ones if Python is somewhere in the dependencies
I'm bringing this up here to have a method that can be used in all similar places. And GROMACS was the only example I know of where the virtual env caused problems before. I was thinking if you could access the configopts
from the parent EC in the extension but don't think so. So we need something else, hence the brainstorming
I tested the snippet in the previous comment (added to the gromacs easyblock) and it works also by passing Python3_EXECUTABLE
to CMAKE_ARGS
. I think it is just about making the find_package
here happy.
And GROMACS was the only example I know of where the virtual env caused problems before
I've also this problem in the LLVM easyblock, since I have to do a multistage build while changing CMake variables across the stages. I ended up just reimplementing part of the CMakeMake easyblock and working with a new dictionary, but also there it might help to be able to inherit all/selectively the options from the CMakeMake.configure
I tested the snippet in the previous comment (added to the gromacs easyblock) and it works also by passing
Python3_EXECUTABLE
toCMAKE_ARGS
. I think it is just about making thefind_package
here happy.
I'm not sure how that works from within the easyblock as the variable is set in the easyconfig. So as far as I see even if you set it in the easyblock the easyconfig will overwrite this. Or what exactly did you change and which EC did you test? GROMACS-2023.1-foss-2022a.eb
from within a minimal virtualenv is one that fails for me. One of my tests uses export Python3_ROOT_DIR=$EBROOTPYTHON && export CMAKE_ARGS=...
. This is running over the (here extended) weekend
I've also this problem in the LLVM easyblock, since I have to do a multistage build while changing CMake variables across the stages. I ended up just reimplementing part of the CMakeMake easyblock and working with a new dictionary, but also there it might help to be able to inherit all/selectively the options from the CMakeMake.configure
That might be a valuable datapoint: If it works easily for GROMACS base, the extension, and LLVM we have quite a few cases covered. But I guess for LLVM setting either the env variable or the *_EXECUTABLE
argument would work as you call the cmakemake configure function
I'm just wondering if using the env variables is more flexible as we won't need to figure out how to pass flags to all the cmake invocation, such as in the pip install
case with some custom env variable that might/will be different for each software if there even is any.
I'm not sure how that works from within the easyblock as the variable is set in the easyconfig.
Sorry forgot to mention i also removed the preinstallopts
from the easyconfig
I'm just wondering if using the env variables is more flexible as we won't need to figure out how to pass flags to all the cmake invocation, such as in the pip install case with some custom env variable that might/will be different for each software if there even is any.
I am not sure, I think the env is not preserved (or reinitialized) across steps so it is possible the environment variable will have to be set in the needed step on a case by case basis? Not entirely sure, but before i tried adding that snippet before the call to easyblock.extensions_step, and it was carried to the installation of the extension
I am not sure, I think the env is not preserved (or reinitialized) across steps so it is possible the environment variable will have to be set in the needed step on a case by case basis? Not entirely sure, but before i tried adding that snippet before the call to easyblock.extensions_step, and it was carried to the installation of the extension
IIRC the environment is (partially?) reset when loading the fake module for the extension installation. But I think we can use either prepare_for_extensions
or install_extensions
to set the env variables.
And yes I guess that would be easyblock specific. But I imagine something like this for easyblocks like GROMACS:
def prepare_for_extensions(self):
super().prepare_for_extensions()
set_cmake_env_vars_python()
That would be slightly more generic than only getting the -D
options. Although I do like to be able to directly set the python executables. Especially as that is more "focused/controlled" than an env variable. But not sure if it is always possible
I've added both the storing of the generated options
inside the class and the suggested convenience function.
Indeed when calling CMake from within a build process there is no defined way to pass it arguments and it will dependent on the implementation.
Concerning the usage of the environment variables one thing to be careful of: i could also see the edge case where the buldsystem itself reset/starts from a new environment before making the call to CMake, but at that point I do not think there is much that can be done beside abiding to the buildsystem way of passing args to CMake or patching.
@boegelbot please test @ generoso EB_ARGS="GROMACS-2023.3-foss-2023a.eb GROMACS-2021.3-foss-2021a.eb HOOMD-blue-4.0.1-foss-2022a.eb vcflib-1.0.9-gfbf-2023a-R-4.3.2.eb --installpath /tmp/$USER/pr3463" CORE_CNT=16
@ocaisa: Request for testing this PR well received on login1
PR test command 'EB_PR=3463 EB_ARGS="GROMACS-2023.3-foss-2023a.eb GROMACS-2021.3-foss-2021a.eb HOOMD-blue-4.0.1-foss-2022a.eb vcflib-1.0.9-gfbf-2023a-R-4.3.2.eb --installpath /tmp/$USER/pr3463" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3463 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh
' executed!
Submitted batch job 14447
Test results coming soon (I hope)...
Test report by @boegelbot
Build succeeded for 12 out of 12 (4 easyconfigs in total) cnx2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8 See https://gist.github.com/boegelbot/56cfbaeef38b5976a67ec8f51229611e for a full test report.
Test report by @boegel
Build succeeded for 56 out of 58 (58 easyconfigs in total) node4002.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8 See https://gist.github.com/boegel/01bd2c5d78f5e0776a55c63ceeb283f0 for a full test report.
PyTorch-bundle-2.1.2-foss-2023a.eb
failed due with error -9
, so probably a segfault (could be due to limited available memory in the test environment, only ~27GBRDKit-2022.03.5-foss-2021b.eb
failed with error below, has nothing to do with the changes being made here
Downloading https://fonts.google.com/download?family=Comic%20Neue...
CMake Error at Code/cmake/Modules/RDKitUtils.cmake:257 (MESSAGE):
The md5 checksum for
/tmp/easybuild_build/RDKit/2022.03.5/foss-2021b/rdkit-Release_2022_03_5/Code/GraphMol/MolDraw2D/Comic_Neue.zip
is incorrect; expected: 850b0df852f1cda4970887b540f8f333, found:
4cd32046e7d03f3905d0d6c433bfae43
EDIT
Following the thread discussion this has changed from setting
Python3_FIND_VIRTUALENV
to settingPython_EXECUTABLE
and all its variants in order to force the CMakeMake easyblock to pick up the correct python when loaded as a module.This would replace:
3282
OLD
This is related to:
FindPython3
viafind_package(Python3)
Python3_FIND_VIRTUALENV
controls how CMake finds the python3 interpreter in case avirtualenv
orconda
environment is activated, by default prioritizing the virtual environment. Setting it toSTANDARD
switch the behavior to looking for python3 inPATH
which allows the correct python to be found when invokingeb
from within a virtual environmentNo version check has been added as the flag is supported since CMake 3.15 which now is tied to an archived toolchain (GCC 8.3.0 https://docs.easybuild.io/policies/toolchains/)
I think this should always be on by default. Even if a build process tries to work with virtual environments (eg creating a venv, activating it and tha running cmake trying to use the python from the venv) i think that activating the virtualenv inside the build process would prepend to the PATH and still allow finding the correct python