Grace-Hopper Pytorch wheels with CUDA support

ptheywood commented 5 months ago

It looks like aarch64 torch wheels with cuda support may be coming to pytorch 2.4. This is planned to be released in July 2024.

Once this is released, we should check if aarch64 wheels include cuda support, and if so update the aarch64 documentation accordingly.

https://github.com/pytorch/builder/pull/1775

willfurnass commented 5 months ago

Good news. Owain Kenway's Twitter feed has full of pained remarks about how tricky it is to build pytorch wheels on GH for several weeks now.

owainkenwayucl commented 5 months ago

It's easy to build but hard to replicate the performance of the wheel from the NGC container.

Currently my builds are about 10% slower running Stable Diffusion XL inference and I can't work out why.

PerilousApricot commented 5 months ago

Hi - @owainkenwayucl do you have some documents somewhere on how you accomplished that? I'm happy to eat the 10% perf hit if it keeps me out of trying to decipher incredibly confusing compiler errors

ptheywood commented 3 months ago

Nightly Pytorch 2.4 and 2.5 builds using CUDA 12.4 via pip include linux-aarch64 builds with CUDA support (nightly channel pytorch list). The wheels are very large (2354.8 MB for python 3.9, as they include all the cuda deps rather than depending on an external package)

CUDA 11.8 and 12.2 nightly builds do not include CUDA support still.

Conda nightly packages do not include linux-aarch64 builds at all, just osx-arm64, linux-64 and win-64, but installing via pip into a conda env seems to behave.

python3 -m venv venv-pytorch-nightly
source venv-pytorch-nightly/
python3 -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.cuda.get_arch_list())"

/path/to/venv-pytorch-nightly/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
2.5.0.dev20240621+cu124
True
['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a']

The installation instruction can be found in the pytorch getting started tool, but it doesn't include which platforms are availble etc

https://pytorch.org/get-started/locally/

ptheywood commented 2 months ago

Pytorch 2.4.0 was released on 2024-07-24.

This should include ARM + CUDA builds (atleast for CUDA 12.4?)

Check if CUDA is available in arm builds via pip (expected)
- [x] CUDA 11.8 pip3 install torch --index-url https://download.pytorch.org/whl/cu118
- no aarch64 buiild in wheelhouse - Latest aarch build in the cu118 wheelhouse is 2.0.1
- [x] CUDA 12.1 pip3 install torch
- CUDA support still not included in pypi wheels
- [x] CUDA 12.4 pip3 install torch --index-url https://download.pytorch.org/whl/cu124
- 2355.1 MB wheel, includes CUDA support for maxwell+ (['sm_50', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'sm_90a'])

[x] Check if CUDA is available in arm builds via conda (unsure)

The Conda channel still does not include linux arm64 builds

pytorch-cuda=12.4 and pytorch-cuda=12.1 are unavailable


PackagesNotFoundError: The following packages are not available from current channels:

pytorch-cuda=12.4*
pytorch-cuda=11.8 is available, but installs a torch 2.3.0 CPU build still:

pytorch            pkgs/main/linux-aarch64::pytorch-2.3.0-cpu_py310h5426786_0 
pytorch-cuda       pytorch/noarch::pytorch-cuda-11.8-h8dd9ede_2 
... 
$ python3 -c "import torch; print(torch.__version__); 
print(torch.cuda.is_available()); 
print(torch.cuda.get_arch_list())"
2.3.0                                                                                                                                     
False                                                                                                                                     
[]

Todo

[x] Update pytorch aarch64 conda docs, that they do not include support as of july 2024.
- [x] Remove note aboute 2.4 maybe changing that
- [x] Adjusted suggested pre-release option to suggest just installing 2.4.0 with cuda 124 via pip
[ ] Update pytrorch aarch64 pip docs
- [x] Remove "This should change with the upcoming PyTorch 2.4 release scheduled for July 2024."
- [x] Remove mention of nightly builds, just suggesting 2.4 cuda 12.4 can be installed via pip
- [x] Explicitly state that 2.4.0 with cuda 11.8 and 12.1 do not include cuda support
- [x] Remove --pre from the example, don't mention nightly

owainkenwayucl commented 1 month ago

Just a note - on RHEL 9 presently with Python 3.9, you don't get the Cuda enabled Pytorch 2.4.0, you get CPU Pytorch 2.0.1 from https://download.pytorch.org/whl/cu124 unless you do torch==2.4.0.

Without:

Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torchaudio
Successfully installed MarkupSafe-2.1.5 certifi-2022.12.7 charset-normalizer-2.1.1 filelock-3.13.1 idna-3.4 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 pillow-10.2.0 requests-2.28.1 sympy-1.12 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 typing-extensions-4.9.0 urllib3-1.26.13
(py24test3) [uccaoke@locust uccaoke]$ python3
Python 3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>> 
(py24test3) [uccaoke@locust uccaoke]$ pip list
Package            Version
------------------ ---------
certifi            2022.12.7
charset-normalizer 2.1.1
filelock           3.13.1
idna               3.4
Jinja2             3.1.3
MarkupSafe         2.1.5
mpmath             1.3.0
networkx           3.2.1
numpy              1.26.3
pillow             10.2.0
pip                24.2
requests           2.28.1
setuptools         72.2.0
sympy              1.12
torch              2.0.1
torchaudio         2.0.2
torchvision        0.15.2
typing_extensions  4.9.0
urllib3            1.26.13
wheel              0.36.2
(py24test3) [uccaoke@locust uccaoke]$

However,

pip3 install torch==2.4.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Works.

It's not clear to me why this is.

ptheywood commented 1 month ago

Bede's Grace-Hopper nodes are currently Rocky 9.4, where it fetches 2.4.0 using the system python 3.9.18 (using --no-cache-dir to make sure it wasn't re-using one i'd installed explicitly before hand).

(.venv) [pheywood@gh001.bede gh-pytorch]$ python3 
Python 3.9.18 (main, May 16 2024, 00:00:00) 
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch 
/nobackup/projects/bdshe01/pheywood/aarch64/gh-pytorch/.venv/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.cuda.is_available()
True
>>>
(.venv) [pheywood@gh001.bede gh-pytorch]$ pip list 
Package           Version
----------------- --------
filelock          3.13.1
fsspec            2024.2.0
Jinja2            3.1.3
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.2.1
pip               21.2.3
setuptools        53.0.0
sympy             1.12
torch             2.4.0
typing_extensions 4.9.0

pip/setuptools being super old here doesn't seem to be the cause of the differnece, as it still fetches 2.4.0 after upgrading both in a fresh venv. Unsure what else could be causing RHEL's pip to resolve the wrong verison.

The same applies when requesting torchaudio and torchvision, not just torch above.

owainkenwayucl commented 1 month ago

Isn't Python packaging wonderful? :D

N8-CIR-Bede / documentation

Grace-Hopper Pytorch wheels with CUDA support #199

Todo