NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.43k stars 1.41k forks source link

Installation with Cuda extentions is failling #1831

Open SaiedaJN opened 3 months ago

SaiedaJN commented 3 months ago

After installing apex 0.1, got : RuntimeError: apex.optimizers.FusedLAMB requires cuda extensions

raise RuntimeError('apex.optimizers.FusedLAMB requires cuda extensions')

RuntimeError: apex.optimizers.FusedLAMB requires cuda extensions

I tried cuda extensions and install options in the README, but it did not work: pip3 install -v --deprecated_fused_lamb --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Usage: pip3 install [options] [package-index-options] ... pip3 install [options] -r [package-index-options] ... pip3 install [options] [-e] ... pip3 install [options] [-e] ... pip3 install [options] <archive url/path> ...

no such option: --deprecated_fused_lamb

Find this command in a similar issue in the GitHub: python setup.py install --cuda_ext and got: RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.

I have : nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Jun_13_19:16:58_PDT_2023 Cuda compilation tools, release 12.2, V12.2.91 Build cuda_12.2.r12.2/compiler.32965470_0

import torch torch.version '2.4.0+cu121'

GPU: A100

Thanks for any assistance.

XYkong-CS commented 1 month ago

The version of torch should match the version of nvcc, so perhaps you should reinstall torch under cu122.

lix19937 commented 1 month ago

RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.

The root cause.