Update or drop the activation script that sets `CUDA_PATH`?

leofang commented 1 year ago

Currently we point CUDA_PATH to the conda env prefix https://github.com/conda-forge/cupy-feedstock/blob/1788a9c9970bbf09e783d64e9fede420cb32b18f/recipe/activate.sh#L8 because we wanted to mimic a real CTK existing somewhere that's visible at run time to CuPy. The activation (deactivation) script is called when the conda env containing a CuPy package is activated (deactivated), during which CUDA_PATH is set (unset). However, this trick is only applicable to the "old" layout (pre-CUDA 12). For CUDA 12, this would likely not work very well. I think we'd need something like export CUDA_PATH=$CONDA_PREFIX/targets/${targetDir} at least, for the new layout.

Also, in the past I've occasionally heard complaints about us overwriting CUDA_PATH for users. It may prompt the consideration of dropping it all together. But unfortunately I do not have time to investigate the consequence of doing so.

cc: @jakirkham @kmaehashi

function2-llx commented 11 months ago

@leofang Hello, I can provide how my usage situation is affected by the behavior setting CUDA_PATH to $CONDA_PREFIX. I install both CuPy and PyTorch in one virtual environment. I want to build a PyTorch extension, but it will fail due to overriding the CUDA_PATH environment variable will make PyTorch fail to find nvcc.

https://github.com/pytorch/pytorch/blob/v2.1.0/torch/utils/cpp_extension.py#L395-L400

def _check_cuda_version(compiler_name: str, compiler_version: TorchVersion) -> None:
    if not CUDA_HOME:
        raise RuntimeError(CUDA_NOT_FOUND_MESSAGE)

    nvcc = os.path.join(CUDA_HOME, 'bin', 'nvcc')
    cuda_version_str = subprocess.check_output([nvcc, '--version']).strip().decode(*SUBPROCESS_DECODE_ARGS)
...

where the value of the CUDA_HOME variable is determined by the following function: https://github.com/pytorch/pytorch/blob/v2.1.0/torch/utils/cpp_extension.py#L93C1-L121

def _find_cuda_home() -> Optional[str]:
    ...
    cuda_home = os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
    if cuda_home is None:
    ...
    if cuda_home and not torch.cuda.is_available():
        print(f"No CUDA runtime is found, using CUDA_HOME='{cuda_home}'",
              file=sys.stderr)
    return cuda_home

leofang commented 11 months ago

I think you can just set an env var CUDA_HOME pointing to your local CUDA installation.

leofang commented 10 months ago

Possible options:

Update the (de-)activate scripts for different layouts: a. CUDA 11: no change b. CUDA 12:
- Linux: export CUDA_PATH=$CONDA_PREFIX/targets/${targetDir}
- Windows: set CUDA_PATH=$CONDA_PREFIX\Library (#232)
Remove the (de-)activate scripts: a. This could be a breaking change, but TBH I prefer this; I no longer feel comfortable about meddling with users' env vars, which are globally visible and not just effective when using CuPy.

jakirkham commented 10 months ago

If we drop the activation scripts, would it make sense to have some way to silence this CuPy warning about CUDA_PATH?

Can understand that warning in the general case. However for Conda packaging with a known layout, wonder if that warning is more confusing to users (and whether users setting it would cause hard to diagnose issues with Conda packages)

leofang commented 10 months ago

We can certainly do that. (We can patch it here for v12, and upstream the patch for v13.) @kmaehashi Do I miss anything?

jakirkham commented 10 months ago

FWIW Numba uses this trick to check for a Conda environment. Maybe CuPy could leverage similar logic?

leofang commented 10 months ago

Oh we already have a detection mechanism via reading the json file shipped with the conda cupy package, so detecting whether inside a conda env is not a problem.

leofang commented 10 months ago

(btw why wouldn't Numba just check if the env var CONDA_PREFIX is set?)

jakirkham commented 10 months ago

The logic was added in PR ( https://github.com/numba/numba/pull/4096 ). Though don't see a mention of this consideration

If I had to guess, it is so that it will work even if the environment is not activated

kmaehashi commented 9 months ago

If we drop the activation scripts, would it make sense to have some way to silence this CuPy warning about CUDA_PATH?

The warning in question is to notify users about (possibly) missing CUDA shared library installation on Windows. In conda environment this should not happen as dependencies are satisfied by the package manager. I think it's fine to disable this warning in conda environment. -> I've made a PR for this: #8051

Despite that, I guess it's better to avoid modifying CUDA_PATH in the activation scripts because CuPy uses $CUDA_PATH/bin/nvcc as the default nvcc path (e.g. used in RawModule(backend='nvcc'))

mtjrider commented 8 months ago

@leofang encouraged me to comment here. In multi-stage build pipelines where conda is used to manage dependencies in conjunction with other (build) tools, I have to reset CUDA_PATH because it is corrupted by package installation.

What is the primary reason CUDA_PATH is set by cupy?

jakirkham commented 8 months ago

FWIW the warning referenced above was removed in CuPy 13 ( https://github.com/cupy/cupy/pull/8076 )

Are there any other action items before making this change?

mtjrider commented 8 months ago

@leofang encouraged me to comment here. In multi-stage build pipelines where conda is used to manage dependencies in conjunction with other (build) tools, I have to reset CUDA_PATH because it is corrupted by package installation.

What is the primary reason CUDA_PATH is set by cupy?

Not trying to be noisy but wanted to capture more detail. If in the middle of a build, cupy is installed, and the associated environment is active, then CUDA_PATH needs to be reset for follow-up compilation where CUDA_PATH is used.

Likewise, if another library built with CFFI bindings (or similar) relies on the environment variable CUDA_PATH being set, then this library will be broken alongside cupy by default.

leofang commented 8 months ago

Thanks all, the new build has the activation scripts purged. @mtjrider would it be possible if you retry your workflow with the latest cupy from conda-forge?

jakirkham commented 8 months ago

Wonder if we should mark the previous CuPy 13.0.0 builds broken so that users only get packages without the activation script when installing CuPy 13.0.0

Edit: After offline discussion with @leofang and @mtjrider , we thought this would be a reasonable change. Filed issue ( https://github.com/conda-forge/cupy-feedstock/issues/255 ) to track and document next steps

conda-forge / cupy-feedstock

Update or drop the activation script that sets `CUDA_PATH`? #206