Closed ptheywood closed 1 month ago
Good news. Owain Kenway's Twitter feed has full of pained remarks about how tricky it is to build pytorch wheels on GH for several weeks now.
It's easy to build but hard to replicate the performance of the wheel from the NGC container.
Currently my builds are about 10% slower running Stable Diffusion XL inference and I can't work out why.
Hi - @owainkenwayucl do you have some documents somewhere on how you accomplished that? I'm happy to eat the 10% perf hit if it keeps me out of trying to decipher incredibly confusing compiler errors
Nightly Pytorch 2.4 and 2.5 builds using CUDA 12.4 via pip include linux-aarch64
builds with CUDA support (nightly channel pytorch list).
The wheels are very large (2354.8 MB for python 3.9, as they include all the cuda deps rather than depending on an external package)
CUDA 11.8 and 12.2 nightly builds do not include CUDA support still.
Conda nightly packages do not include linux-aarch64 builds at all, just osx-arm64, linux-64 and win-64, but installing via pip into a conda env seems to behave.
python3 -m venv venv-pytorch-nightly
source venv-pytorch-nightly/
python3 -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.cuda.get_arch_list())"
/path/to/venv-pytorch-nightly/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
2.5.0.dev20240621+cu124
True
['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a']
The installation instruction can be found in the pytorch getting started tool, but it doesn't include which platforms are availble etc
Pytorch 2.4.0 was released on 2024-07-24.
This should include ARM + CUDA builds (atleast for CUDA 12.4?)
pip
(expected)
pip3 install torch --index-url https://download.pytorch.org/whl/cu118
pip3 install torch
pip3 install torch --index-url https://download.pytorch.org/whl/cu124
2355.1 MB
wheel, includes CUDA support for maxwell+ (['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a']
)[x] Check if CUDA is available in arm builds via conda
(unsure)
The Conda channel still does not include linux arm64 builds
pytorch-cuda=12.4
and pytorch-cuda=12.1
are unavailable
PackagesNotFoundError: The following packages are not available from current channels:
pytorch-cuda=12.4*
pytorch-cuda=11.8
is available, but installs a torch 2.3.0 CPU build still:
pytorch pkgs/main/linux-aarch64::pytorch-2.3.0-cpu_py310h5426786_0
pytorch-cuda pytorch/noarch::pytorch-cuda-11.8-h8dd9ede_2
...
$ python3 -c "import torch; print(torch.__version__);
print(torch.cuda.is_available());
print(torch.cuda.get_arch_list())"
2.3.0
False
[]
--pre
from the example, don't mention nightlyJust a note - on RHEL 9 presently with Python 3.9, you don't get the Cuda enabled Pytorch 2.4.0, you get CPU Pytorch 2.0.1 from https://download.pytorch.org/whl/cu124 unless you do torch==2.4.0.
Without:
Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torchaudio
Successfully installed MarkupSafe-2.1.5 certifi-2022.12.7 charset-normalizer-2.1.1 filelock-3.13.1 idna-3.4 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 pillow-10.2.0 requests-2.28.1 sympy-1.12 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 typing-extensions-4.9.0 urllib3-1.26.13
(py24test3) [uccaoke@locust uccaoke]$ python3
Python 3.9.18 (main, Jan 4 2024, 00:00:00)
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
(py24test3) [uccaoke@locust uccaoke]$ pip list
Package Version
------------------ ---------
certifi 2022.12.7
charset-normalizer 2.1.1
filelock 3.13.1
idna 3.4
Jinja2 3.1.3
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.2.1
numpy 1.26.3
pillow 10.2.0
pip 24.2
requests 2.28.1
setuptools 72.2.0
sympy 1.12
torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2
typing_extensions 4.9.0
urllib3 1.26.13
wheel 0.36.2
(py24test3) [uccaoke@locust uccaoke]$
However,
pip3 install torch==2.4.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Works.
It's not clear to me why this is.
Bede's Grace-Hopper nodes are currently Rocky 9.4, where it fetches 2.4.0 using the system python 3.9.18 (using --no-cache-dir
to make sure it wasn't re-using one i'd installed explicitly before hand).
(.venv) [pheywood@gh001.bede gh-pytorch]$ python3
Python 3.9.18 (main, May 16 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
/nobackup/projects/bdshe01/pheywood/aarch64/gh-pytorch/.venv/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.cuda.is_available()
True
>>>
(.venv) [pheywood@gh001.bede gh-pytorch]$ pip list
Package Version
----------------- --------
filelock 3.13.1
fsspec 2024.2.0
Jinja2 3.1.3
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.2.1
pip 21.2.3
setuptools 53.0.0
sympy 1.12
torch 2.4.0
typing_extensions 4.9.0
pip
/setuptools
being super old here doesn't seem to be the cause of the differnece, as it still fetches 2.4.0 after upgrading both in a fresh venv. Unsure what else could be causing RHEL's pip to resolve the wrong verison.
The same applies when requesting torchaudio and torchvision, not just torch above.
Isn't Python packaging wonderful? :D
It looks like aarch64 torch wheels with cuda support may be coming to pytorch 2.4. This is planned to be released in July 2024.
Once this is released, we should check if aarch64 wheels include cuda support, and if so update the aarch64 documentation accordingly.
https://github.com/pytorch/builder/pull/1775