easybuilders / easybuild-easyblocks

Collection of easyblocks that implement support for building and installing software with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
106 stars 285 forks source link

let `CMakeMake` easyblock also set `Python_EXECUTABLE` option, as well as `Python3_EXECUTABLE` and `Python2_EXECUTABLE` derivatives (when appropriate) #3463

Closed Crivella closed 1 month ago

Crivella commented 1 month ago

EDIT

Following the thread discussion this has changed from setting Python3_FIND_VIRTUALENV to setting Python_EXECUTABLE and all its variants in order to force the CMakeMake easyblock to pick up the correct python when loaded as a module.

This would replace:

OLD

This is related to:

Python3_FIND_VIRTUALENV controls how CMake finds the python3 interpreter in case a virtualenv or conda environment is activated, by default prioritizing the virtual environment. Setting it to STANDARD switch the behavior to looking for python3 in PATH which allows the correct python to be found when invoking eb from within a virtual environment

No version check has been added as the flag is supported since CMake 3.15 which now is tied to an archived toolchain (GCC 8.3.0 https://docs.easybuild.io/policies/toolchains/)

I think this should always be on by default. Even if a build process tries to work with virtual environments (eg creating a venv, activating it and tha running cmake trying to use the python from the venv) i think that activating the virtualenv inside the build process would prepend to the PATH and still allow finding the correct python

ocaisa commented 1 month ago

@Flamefire You could probably give the most informed review of this. I would be inclined to accept it.

Flamefire commented 1 month ago

I don't see any problem with this in general besides increasing the number of (potentially superflous) parameters we pass to configure.

However: The GROMACS change was very focused on that specific software and restricted to "faulty" versions. Here we'd not only need to pass Python3_FIND_VIRTUALENV but also Python_FIND_VIRTUALENV and Python_FIND_VIRTUALENV to cover all cases.

I'd prefer to pass the Python binary path explicitly as done in this PR This not only avoids it finding a virtualenv Python ("fixed" here) but also covers many other cases we might not found yet.
So that is more general without unnecessarily increasing verbosity for cases where we don't need it.

@Crivella What do you think about that? Can you try #3282 with your https://github.com/easybuilders/easybuild-easyblocks/pull/3373 not setting the Python3_FIND_VIRTUALENV to see if that works?

Crivella commented 1 month ago

@Flamefire Started the test it will take a couple hours to finish, but since stage 1 passed i don't think there would be a problem. My concern was the order with which the hints where evaluated since #3283 was necessary, but Python_ROOT_DIR seems to take precedence above all

Just ran a minimal test with

    cmake_minimum_required(VERSION 3.15.0)

    project(MyProject)

    find_package(Python3)

and

#!/usr/bin/env bash

module purge
module load CMake/3.29.3-GCCcore-13.3.0
module load Python/3.12.3-GCCcore-13.3.0
module list

source $HOME/.virtualenvs/test/bin/activate
which python

rm -rf build
cmake -B build -S . \
    -DPython3_ROOT_DIR=/usr/bin \
    -DPython3_FIND_VIRTUALENV=ONLY

which results in

...
-- Found Python3: /usr/bin/python3.12 (found version "3.12.5") found components: Interpreter
...

Was there some weirder reason that did not make GROMACS work with #3282 ?

Flamefire commented 1 month ago

I'll try to compile my findings on the order of paths considered.

The documentation states:

    `Python3_FIND_VIRTUALENV`  variable can be set to one of the following:

        FIRST: The virtual environment is used before any other standard paths to look-up for the interpreter. This is the default.

        ONLY: Only the virtual environment is used to look-up for the interpreter.

        STANDARD: The virtual environment is not used to look-up for the interpreter but environment variable PATH is always considered. In this case, variable Python3_FIND_REGISTRY (Windows) or CMAKE_FIND_FRAMEWORK (macOS) can be set with value LAST or NEVER to select preferably the interpreter from the virtual environment.

The relevant file is https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindPython/Support.cmake which gets "called" by "FindPython*" with ${_PYTHON_PREFIX} set to e.g. "Python3"

We can ignore the "STANDARD" for now because when *_FIND_VIRTUALENV is not set to FIRST or ONLY then $CMAKE_PREFIX_PATH is used which we set in the module.

Which leads to:

And find_program considers:

  1. <PackageName>_ROOT CMake variable, where <PackageName> is the case-preserved package name.
  2. <PACKAGENAME>_ROOT CMake variable, where <PACKAGENAME> is the upper-cased package name.
  3. HINTS: ENV variable ${_PYTHON_PREFIX}_ROOT_DIR
  4. PATHS: ENV variables VIRTUAL_ENV & CONDA_PREFIX

I excluded all of the ones mentioned in the docs that are excluded by the explicit NO_* options. So we should be safe. However there is "[The first 2 steps] can be skipped by setting the CMAKE_FIND_USE_PACKAGE_ROOT_PATH variable to FALSE.

So maybe we need to also set the *_ROOT_DIR env variables to be safe to not get a virtualenv.

Crivella commented 1 month ago

By looking at the CMake source you linked (never realized it was so messy XD) and experimenting a bit more, I am wondering if we should just cut to the source and set Python_EXECUTABLE and/or Python[2/3]_EXECUTABLE to avoid going to the HINTS part of the code.

It seems to also find the correct components when specifying COMPONENTS Development

test.sh

#!/usr/bin/env bash

module purge
module load CMake/3.29.3-GCCcore-12.3.0
module load Python/3.11.3-GCCcore-12.3.0
module load SciPy-bundle/2023.07-gfbf-2023a
module list

source $HOME/.virtualenvs/test/bin/activate
which python

_PYTHON="/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin/python"

rm -rf build
cmake -B build -S . \
    -DPython_FIND_VIRTUALENV=ONLY \
    -DPython_ROOT=/usr/bin \
    -DPython_EXECUTABLE=${_PYTHON} \

CMakeLists.txt

cmake_minimum_required(VERSION 3.15.0)

project(MyProject)

find_package(
    Python 3.11.0...3.11.10
    COMPONENTS Interpreter Development NumPy
    )

CMakeCache.txt

...
//Compiler reason failure
_Python_Compiler_REASON_FAILURE:INTERNAL=
_Python_DEVELOPMENT_EMBED_SIGNATURE:INTERNAL=6e4a7d11dfc3083ba9e17dd9c0c9ffd5
_Python_DEVELOPMENT_MODULE_SIGNATURE:INTERNAL=7f9cffd5f241febb4fdd6ed780db4712
//Development reason failure
_Python_Development_REASON_FAILURE:INTERNAL=
_Python_EXECUTABLE:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin/python
//Path to a file.
_Python_INCLUDE_DIR:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11
//Python Properties
_Python_INTERPRETER_PROPERTIES:INTERNAL=Python;3;11;3;64;;cpython-311-x86_64-linux-gnu;abi3;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages;/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages
_Python_INTERPRETER_SIGNATURE:INTERNAL=a6927044a8e489828e7111e599980346
//Path to a library.
_Python_LIBRARY_RELEASE:INTERNAL=/home/crivella/.local/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so
_Python_NUMPY_SIGNATURE:INTERNAL=64759e84174ed96a40e217a0380274d0
//Path to a file.
_Python_NumPy_INCLUDE_DIR:INTERNAL=/home/crivella/.local/easybuild/software/SciPy-bundle/2023.07-gfbf-2023a/lib/python3.11/site-packages/numpy/core/include
...

The fact that _Python_Development_REASON_FAILURE is still there is due to it never getting unset in the code, but _Python_LIBRARY_RELEASE: pointing to the correct .so makes me thing stuff is working correctly

Flamefire commented 1 month ago

I am wondering if we should just cut to the source and set Python_EXECUTABLE and/or Python[2/3]_EXECUTABLE

Yes I guess that would also work although I don't know if there is some (old?) CMake user code out there that relies on Python_ROOT. We'd need to check at least the history of the module in the CMake repo.

I also guess there is no trouble in using $EBROOT_PYTHON/bin/python (resolved by the easyblock) as the param value

Crivella commented 1 month ago

I've been browsing through the CMake source code previous versions. I've also tested up to CMake v3.12 where FindPython was introduced and Python_EXECUTABLE works for all of them.

I think the logic was properly implemented in 3.16, using _${PYTHON_PREFIX}_EXECUTABLE as the store variable for find_program but even before that, since find_program is being called directly on ${PYTHON_PREFIX}_EXECUTABLE due to the behavior of find_program

If the program is found the result is stored in the variable and the search will not be repeated unless the variable is cleared. If nothing is found, the result will be -NOTFOUND.

This is still working since we are manually setting the result to what we want and ${PYTHON_PREFIX}_EXECUTABLE is not being unset by the module.

I also tested that there is no problem in case the python version does not matches the requirements since the version is later validated even if we are forcing the find_program result.

I've made a new commit with the proposed changes (https://github.com/easybuilders/easybuild-easyblocks/pull/3463/commits/47f21b974fda5c4b4404e09f3a677de0e0667b49).

I would argue that unless we want to analyze the CMakeLists.txt we have to set both Python_EXECUTABLE and PYTHON_EXECUTABLE in case some code is still using the deprecated find_package{PythonInterp ...) (unfortunately they are case sensitive)

Crivella commented 1 month ago

I guess there is also an argument to be made to actually set Python_EXECUTABLE to a dump location so that if people are making an EC for a code that requires python they are forced to use one through EB modules instead of randomly picking up a system one for reproducibility?

Crivella commented 1 month ago

What exactly have you tested? Did you test 3.12 and below or 3.12 and up? Because how I understand the change in 3.16 the *_EXECUTABLE isn't honored before.

Checked 3.12 and above since before there is only FindPythonInterp which has not been changed

So unless the python cmd set is invalid it should be kept because find_program does nothing if the variable is already set.

Yes this is what i meant in my previous comment, even if it is not explicitly the behavior of FindPython due to the behavior of find_program as long as we set ${PYTHON_PREFIX}_EXECUTABLE to a correct path there should be no problem (also concerning version checks since they happens after and will give the correct error if we are passing a non supported version of python)

Flamefire commented 1 month ago

Test report by @Flamefire

Overview of tested easyconfigs (in order)

Build succeeded for 12 out of 15 (7 easyconfigs in total) i7021 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17 See https://gist.github.com/Flamefire/4db38a9b8c8fdb93627119d0443b789c for a full test report.

Flamefire commented 1 month ago

GROMACS 2023.x fails during the installation of the Python extension which uses CMake through pip and then runs into the same problem. We do that extension installation (in the current way) only since 2023.x and 2024 uses scikit-build which initializes the CMakeCache with some Python variables.

I'm not sure how to solve this. The extension in this EasyConfig is a "regular" PythonPackage but it supports an env variable CMAKE_ARGS to pass arguments to CMake.

We could likely also set Python3_ROOT_DIR as env variables which should also avoid it finding the virtualenv.

However: Any ideas how to pass either the arguments or the env variables to the extension? I guess the gromacs easyblock could set the env variables before installing the extensions but then it would be good to have some way to get the values from the cmake easyblock to avoid duplication

Crivella commented 1 month ago

Have not worked much with extensions yet, but by looking at the code I think the easiest solution would be to modify the CMAKE_ARGS to define either Python3_ROOT_DIR or Python3_EXECUTABLE. That could be done either overriding install_extensions_sequential setting the variable before calling the original method eg

    def install_extensions_sequential(self, *args, **kwargs):
        """Install the gmxapi extension"""
        python_root = get_software_root('Python')
        python_bin = os.path.join(python_root, 'bin', 'python')
        env.setvar(
            'CMAKE_ARGS', 
            "-Dgmxapi_ROOT=%s -DPython3_EXECUTABLE=%s -C %s/share/cmake/gromacs_mpi/gromacs-hints_mpi.cmake" % 
            (self.installdir, python_bin, self.installdir)
            )
        super(EB_GROMACS, self).install_extensions_sequential(*args, **kwargs)

or possibly defining an hook for the extension (https://github.com/easybuilders/easybuild-framework/blob/8ff6ba0d426b79a9b3da1248a7359d922b50596d/easybuild/framework/easyblock.py#L1916)

I am not sure if there is a way to modify the extension steps in some other way

However: Any ideas how to pass either the arguments or the env variables to the extension? I guess the gromacs easyblock could set the env variables before installing the extensions but then it would be good to have some way to get the values from the cmake easyblock to avoid duplication

I guess this would be a design choice. I can think of two way:

Also I would say the 2 are not mutually exclusive and both could benefit also from implementing a scoped version of the options dictionary where options are split into groups that make sense to recall together eg ('policies', 'generic/base', 'software_[python/...]', ...)

Flamefire commented 1 month ago

Have not worked much with extensions yet, but by looking at the code I think the easiest solution would be to modify the CMAKE_ARGS to define either Python3_ROOT_DIR or Python3_EXECUTABLE. That could be done either overriding install_extensions_sequential setting the variable before calling the original method eg

That specific variable is only for GROMACS: https://github.com/easybuilders/easybuild-easyconfigs/blob/ef56e11c4066828fa4af98b4709ba40728fc4cc6/easybuild/easyconfigs/g/GROMACS/GROMACS-2024.1-foss-2023b.eb#L78

Hence the idea for that specific easyblock to get required env vars from cmakemake (e.g. Python3_ROOT_DIR) to set them before installing an extension. For that it might be easier, if we use those env vars in cmakemake too, instead of *_EXECUTABLE.

* Add a convenience function eg `get_cmakeopts_python` to get all cmake options related to python

Yes that is what I had in mind: get_cmake_env_vars_python or so. Or even just get_cmake_env_vars and it would detect which ones to set dependent on loaded modules, e.g. add the python ones if Python is somewhere in the dependencies

I'm bringing this up here to have a method that can be used in all similar places. And GROMACS was the only example I know of where the virtual env caused problems before. I was thinking if you could access the configopts from the parent EC in the extension but don't think so. So we need something else, hence the brainstorming

Crivella commented 1 month ago

I tested the snippet in the previous comment (added to the gromacs easyblock) and it works also by passing Python3_EXECUTABLE to CMAKE_ARGS. I think it is just about making the find_package here happy.

And GROMACS was the only example I know of where the virtual env caused problems before

I've also this problem in the LLVM easyblock, since I have to do a multistage build while changing CMake variables across the stages. I ended up just reimplementing part of the CMakeMake easyblock and working with a new dictionary, but also there it might help to be able to inherit all/selectively the options from the CMakeMake.configure

Flamefire commented 1 month ago

I tested the snippet in the previous comment (added to the gromacs easyblock) and it works also by passing Python3_EXECUTABLE to CMAKE_ARGS. I think it is just about making the find_package here happy.

I'm not sure how that works from within the easyblock as the variable is set in the easyconfig. So as far as I see even if you set it in the easyblock the easyconfig will overwrite this. Or what exactly did you change and which EC did you test? GROMACS-2023.1-foss-2022a.eb from within a minimal virtualenv is one that fails for me. One of my tests uses export Python3_ROOT_DIR=$EBROOTPYTHON && export CMAKE_ARGS=.... This is running over the (here extended) weekend

I've also this problem in the LLVM easyblock, since I have to do a multistage build while changing CMake variables across the stages. I ended up just reimplementing part of the CMakeMake easyblock and working with a new dictionary, but also there it might help to be able to inherit all/selectively the options from the CMakeMake.configure

That might be a valuable datapoint: If it works easily for GROMACS base, the extension, and LLVM we have quite a few cases covered. But I guess for LLVM setting either the env variable or the *_EXECUTABLE argument would work as you call the cmakemake configure function

I'm just wondering if using the env variables is more flexible as we won't need to figure out how to pass flags to all the cmake invocation, such as in the pip install case with some custom env variable that might/will be different for each software if there even is any.

Crivella commented 1 month ago

I'm not sure how that works from within the easyblock as the variable is set in the easyconfig.

Sorry forgot to mention i also removed the preinstallopts from the easyconfig

I'm just wondering if using the env variables is more flexible as we won't need to figure out how to pass flags to all the cmake invocation, such as in the pip install case with some custom env variable that might/will be different for each software if there even is any.

I am not sure, I think the env is not preserved (or reinitialized) across steps so it is possible the environment variable will have to be set in the needed step on a case by case basis? Not entirely sure, but before i tried adding that snippet before the call to easyblock.extensions_step, and it was carried to the installation of the extension

Flamefire commented 1 month ago

I am not sure, I think the env is not preserved (or reinitialized) across steps so it is possible the environment variable will have to be set in the needed step on a case by case basis? Not entirely sure, but before i tried adding that snippet before the call to easyblock.extensions_step, and it was carried to the installation of the extension

IIRC the environment is (partially?) reset when loading the fake module for the extension installation. But I think we can use either prepare_for_extensions or install_extensions to set the env variables.
And yes I guess that would be easyblock specific. But I imagine something like this for easyblocks like GROMACS:

def prepare_for_extensions(self):
    super().prepare_for_extensions()
    set_cmake_env_vars_python()

That would be slightly more generic than only getting the -D options. Although I do like to be able to directly set the python executables. Especially as that is more "focused/controlled" than an env variable. But not sure if it is always possible

Crivella commented 1 month ago

I've added both the storing of the generated options inside the class and the suggested convenience function. Indeed when calling CMake from within a build process there is no defined way to pass it arguments and it will dependent on the implementation.

Concerning the usage of the environment variables one thing to be careful of: i could also see the edge case where the buldsystem itself reset/starts from a new environment before making the call to CMake, but at that point I do not think there is much that can be done beside abiding to the buildsystem way of passing args to CMake or patching.

ocaisa commented 1 month ago

@boegelbot please test @ generoso EB_ARGS="GROMACS-2023.3-foss-2023a.eb GROMACS-2021.3-foss-2021a.eb HOOMD-blue-4.0.1-foss-2022a.eb vcflib-1.0.9-gfbf-2023a-R-4.3.2.eb --installpath /tmp/$USER/pr3463" CORE_CNT=16

boegelbot commented 1 month ago

@ocaisa: Request for testing this PR well received on login1

PR test command 'EB_PR=3463 EB_ARGS="GROMACS-2023.3-foss-2023a.eb GROMACS-2021.3-foss-2021a.eb HOOMD-blue-4.0.1-foss-2022a.eb vcflib-1.0.9-gfbf-2023a-R-4.3.2.eb --installpath /tmp/$USER/pr3463" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3463 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

Test results coming soon (I hope)...

*- notification for comment with ID 2407211069 processed* *Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).*
boegelbot commented 1 month ago

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 12 out of 12 (4 easyconfigs in total) cnx2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8 See https://gist.github.com/boegelbot/56cfbaeef38b5976a67ec8f51229611e for a full test report.

boegel commented 1 month ago

Test report by @boegel

Overview of tested easyconfigs (in order)

Build succeeded for 56 out of 58 (58 easyconfigs in total) node4002.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8 See https://gist.github.com/boegel/01bd2c5d78f5e0776a55c63ceeb283f0 for a full test report.

boegel commented 1 month ago