NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.29k stars 14.27k forks source link

[Tracking] ROCm packages #197885

Open Madouura opened 2 years ago

Madouura commented 2 years ago

Tracking issue for ROCm derivations.

moar packages

Key

WIP

-

Ready

-

TODO

Merged

ROCm-related

Notes

Won't implement

Madouura commented 2 years ago

Updating to 5.3.1, marking all WIP until pushed to their respective PRs and verified.

Madouura commented 2 years ago

If anyone is interested in helping me debug rocBLAS, here's the current derivation Already fixed.

Flakebi commented 2 years ago

Hi, thanks a lot for your work on ROCm packages!

So far, the updates where all aggregated in a single rocm: 5.a.b -> 5.x.y pr. I think that makes more sense than splitting the package updates into single prs for a couple of reasons:

tl;dr, do you mind merging all your 5.3.1 updates into a single PR?

PS: Not sure how you did the update, I usually do it with for f in rocm-smi rocm-cmake rocm-thunk rocm-runtime rocm-opencl-runtime rocm-device-libs rocm-comgr rocclr rocminfo llvmPackages_rocm.llvm hip; nix-shell maintainers/scripts/update.nix --argstr commit true --argstr package $f; end.

Madouura commented 2 years ago

I was actually afraid of the opposite being true so I split them up. Got it, I'll aggregate them. Thanks for the tip on the update script, that would have saved me a lot of time.

Madouura commented 2 years ago

Hip I think should stay separate though, since there are other changes. Actually never mind it's just an extra dependency so should be fine to split it.

Madouura commented 2 years ago

Done. #198770

Madouura commented 2 years ago

Hold off on merging while I investigate just disabling BUILD_FILE_REORG_BACKWARD_COMPATIBILITY in this and other packages.

198770 is okay to merge, however.

Madouura commented 2 years ago

197838 is okay to merge again, see https://github.com/NixOS/nixpkgs/pull/197838#issuecomment-1301578778

Madouura commented 2 years ago

Updated all relevant PRs to note why we are manually defining CMAKE_INSTALL_<DIR>.

Madouura commented 2 years ago

@acowley, @lovesegfault, and @Flakebi, could you please look at https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1277? Pinging since you're the maintainers of rocm-llvm. I think we may need to adjust rocm-llvm in some way. Already fixed.

Madouura commented 2 years ago

@lovesegfault Mass change to rocmVersion done. Naturally, all the relevant PRs are broken until #198770 is merged, if testing please make sure to cherry-pick or rebase on that PR.

Madouura commented 2 years ago

Rolled everything into #198770

Madouura commented 2 years ago

May want to change rec to finalAttrs, mainly since we do a cross-derivation check to determine if the derivation is broken.

Madouura commented 2 years ago

@Flakebi Once #202373 is merged, would you mind standardizing the older rocm derivations to the new ones? Assuming @lovesegfault and @acowley are okay with it, of course.

acowley commented 2 years ago

Of course!

Madouura commented 2 years ago

rocRAND already includes hipRAND and hipRAND has no releases or tags so until further notice, no separate hipRAND derivation.

Madouura commented 2 years ago

If anyone is interested in helping me debug the roctracer derivation, see: https://github.com/Madouura/nixpkgs/blob/05d79d8d63c96f06cf53a00e6e83ab2354fe2bce/pkgs/development/libraries/roctracer/default.nix

Relevant issues: - https://github.com/ROCm-Developer-Tools/roctracer/issues/82 - https://github.com/ROCm-Developer-Tools/roctracer/issues/83 Already fixed.

Madouura commented 1 year ago

Last few packages with a ROCm version are gonna be put into a PR soon. After that, going to work on implementing any missed documention, impureTests, and -DAMDGPU_TARGETS. Then, after that, one final possible rocm-related (not anything like lit this time) rebuild with cleanup/optimization/better patchelf for all the rocm packages with everything I've learned. Then, finally, pytorch and tensorflow.

Madouura commented 1 year ago

If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: https://github.com/Madouura/nixpkgs/commit/8f667a0df2a7c29a8fb4323d1524b3314cbd9068

LunNova commented 1 year ago

Don't have time to help, it's really cool that you're working on this and I'll probably benefit from it at some point in the future if it gets in! Thanks for putting in all the effort here.

Madouura commented 1 year ago

Should we make a rocmPackages attrset similar to cudaPackages? I'm already looking at mass changes with using strictDeps and it would be nice to include llvmPackages_rocm.<derivation> in it since the llvm package for rocm is completely custom. It might also be nice since we could separate major changes (i.e.: 5.x.x -> 6.x.x) and keep both. It would also allow us to group different packages into other attrsets to emulate the common rocm layout. (i.e.: rocm-developer-tools, hip-developer-tools, etc) cc @flakebi @lovesegfault @acowley

Flakebi commented 1 year ago

A package set for rocm sounds good to me. Thanks a lot for all of your work on ROCm packages!

happysalada commented 1 year ago

@Madouura thanks a lot for your work on this! I was looking at testing rocm support for torch with openai-whisper. I've added the following package

          (openai-whisper.overrideAttrs (finalAttrs: previousAttrs: {
            torch = python3Packages.torchWithRocm;
          }))

however when I try to run it whisper cougar_town.mp3 --model large-v2 --device 'cuda' then I get

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

when I type rocminfo, It does show me a gpu though. If there is anything that can be tried, I'd be happy to test stuff.

(using cuda as device is coming from the guide here

Madouura commented 1 year ago

@Madouura thanks a lot for your work on this! I was looking at testing rocm support for torch with openai-whisper. I've added the following package

          (openai-whisper.overrideAttrs (finalAttrs: previousAttrs: {
            torch = python3Packages.torchWithRocm;
          }))

however when I try to run it whisper cougar_town.mp3 --model large-v2 --device 'cuda' then I get

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

when I type rocminfo, It does show me a gpu though. If there is anything that can be tried, I'd be happy to test stuff.

(using cuda as device is coming from the guide here

Strange, I can't find any reason this would be happening in the package's code. Try running this with torchWithRocm and paste the output so I can make sure it's not on your side. nix-shell -I nixpkgs=${nixpkgs-unstable} -p python3Packages.torchWithRocm benchmark.py

import torch, timeit

print(f"CUDA support: {torch.cuda.is_available()} (Should be \"True\")")
print(f"CUDA version: {torch.version.cuda} (Should be \"None\")")
print(f"HIP version: {torch.version.hip} (Should contain \"5.7\")")

# Storing ID of current CUDA device
cuda_id = torch.cuda.current_device()
print(f"Current CUDA device ID: {torch.cuda.current_device()}")
print(f"Current CUDA device name: {torch.cuda.get_device_name(cuda_id)} (Should be AMD, not NVIDIA)")

def batched_dot_mul_sum(a, b):
    '''Computes batched dot by multiplying and summing'''
    return a.mul(b).sum(-1)

def batched_dot_bmm(a, b):
    '''Computes batched dot by reducing to bmm'''
    a = a.reshape(-1, 1, a.shape[-1])
    b = b.reshape(-1, b.shape[-1], 1)
    return torch.bmm(a, b).flatten(-3)

x = torch.randn(10000, 1024, device='cuda')

t0 = timeit.Timer(
    stmt='batched_dot_mul_sum(x, x)',
    setup='from __main__ import batched_dot_mul_sum',
    globals={'x': x})

t1 = timeit.Timer(
    stmt='batched_dot_bmm(x, x)',
    setup='from __main__ import batched_dot_bmm',
    globals={'x': x})

# Ran each twice to show difference before/after warmup
print(f'mul_sum(x, x):  {t0.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'mul_sum(x, x):  {t0.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'bmm(x, x):      {t1.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'bmm(x, x):      {t1.timeit(100) / 100 * 1e6:>5.1f} us')
Madouura commented 1 year ago

https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/transformers/default.nix This may also need to be changed to torchWithRocm. Tensorflow hasn't been ROCmified yet in nixpkgs so that could also be an upcoming issue.

happysalada commented 1 year ago

Thanks a lot for the help, I got rid of the runtime error with your help. I needed to use override and not override attrs, so for anyone else I needed

          (let torchWithRocm = python3Packages.torchWithRocm;
            in
          openai-whisper.override {
            torch = torchWithRocm;
            transformers = python3Packages.transformers.override {
              torch = torchWithRocm;
            };
          })

However, now when I run the program, It seems to get stuck.

I was looking at the gputargets inside hip, and indeed mine is missing (I've got gfx90c) Do you think it would be okay to add it ? https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/compilers/hip/default.nix#L166 The comment looks old, and looking at the supported GPUs, all the gfx9 should be included https://community.amd.com/t5/knowledge-base/amd-rocm-hardware-and-software-support-document/ta-p/489937 Is there anything to watch out for ? Let me know if you think I've missed something.

luizirber commented 1 year ago

This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]

It does fail with missing TensileLibrary.dat in rocBLAS, tho:

$ python benchmark.py
CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.4.22803-0 (Should contain "5.4")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA)
mul_sum(x, x):   84.2 us
mul_sum(x, x):   10.9 us

rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

I used this flake.nix to create a shell:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
  };

  outputs = { self, nixpkgs }:
    let pkgs = nixpkgs.legacyPackages."x86_64-linux";
    in {
      devShells.x86_64-linux.default = pkgs.mkShell {
        buildInputs = [
          pkgs.python3Packages.torchWithRocm
          pkgs.rocm-smi
          pkgs.nvtop-amd
        ];
      };
    };
}

How do I get the .dat files?

Thanks again!

Madouura commented 1 year ago

Thanks a lot for the help, I got rid of the runtime error with your help. I needed to use override and not override attrs, so for anyone else I needed

          (let torchWithRocm = python3Packages.torchWithRocm;
            in
          openai-whisper.override {
            torch = torchWithRocm;
            transformers = python3Packages.transformers.override {
              torch = torchWithRocm;
            };
          })

However, now when I run the program, It seems to get stuck.

I was looking at the gputargets inside hip, and indeed mine is missing (I've got gfx90c) Do you think it would be okay to add it ? https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/compilers/hip/default.nix#L166 The comment looks old, and looking at the supported GPUs, all the gfx9 should be included https://community.amd.com/t5/knowledge-base/amd-rocm-hardware-and-software-support-document/ta-p/489937 Is there anything to watch out for ? Let me know if you think I've missed something.

gfx90c likely isn't supported, but go ahead and try adding it to the gpuTargets and see if it works.

This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]

It does fail with missing TensileLibrary.dat in rocBLAS, tho:

$ python benchmark.py
CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.4.22803-0 (Should contain "5.4")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA)
mul_sum(x, x):   84.2 us
mul_sum(x, x):   10.9 us

rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

I used this flake.nix to create a shell:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
  };

  outputs = { self, nixpkgs }:
    let pkgs = nixpkgs.legacyPackages."x86_64-linux";
    in {
      devShells.x86_64-linux.default = pkgs.mkShell {
        buildInputs = [
          pkgs.python3Packages.torchWithRocm
          pkgs.rocm-smi
          pkgs.nvtop-amd
        ];
      };
    };
}

How do I get the .dat files?

Thanks again!

IIRC the RX 5500 XT isn't supported on rocBLAS, you're out of luck until it is. Make an issue on the rocBLAS and see if they can help. Sorry!

Madouura commented 1 year ago

See https://github.com/RadeonOpenCompute/ROCm/blob/77cbac4abab13046ee93d8b5bf410684caf91145/README.md#library-target-matrix for what is definitely supported.

Madouura commented 1 year ago

Okay nevermind, 5500 is supported on rocBLAS. I'll look into it, still make that issue if you can please.

Madouura commented 1 year ago

@luizirber Did you build rocBLAS yourself or did you fetch from the nixos cache? The tensile stuff should be generated regardless and IIRC is working on hydra, so this shouldn't be happening.

Madouura commented 1 year ago

@happysalada This may be of some use to you. https://github.com/RadeonOpenCompute/ROCm/issues/1743

Madouura commented 1 year ago

@luizirber Try setting these both to false: https://github.com/NixOS/nixpkgs/blob/6e30230a696728c3b14a477911293e257de4873d/pkgs/development/libraries/rocblas/default.nix#L22-L23 This still shouldn't be happening on the benchmark, but whatever it is on your side this should fix it. (Note: This will take ages)

Madouura commented 1 year ago

This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]

It does fail with missing TensileLibrary.dat in rocBLAS, tho:

$ python benchmark.py
CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.4.22803-0 (Should contain "5.4")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA)
mul_sum(x, x):   84.2 us
mul_sum(x, x):   10.9 us

rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

I used this flake.nix to create a shell:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
  };

  outputs = { self, nixpkgs }:
    let pkgs = nixpkgs.legacyPackages."x86_64-linux";
    in {
      devShells.x86_64-linux.default = pkgs.mkShell {
        buildInputs = [
          pkgs.python3Packages.torchWithRocm
          pkgs.rocm-smi
          pkgs.nvtop-amd
        ];
      };
    };
}

How do I get the .dat files?

Thanks again!

I should mention I did exactly that setup just now and am still not getting any rocBLAS errors. I wonder if it's your setup or there's some switch toggled if the GPU is different? It could be that the 5500 xt's tensile stuff isn't built and it's defaulting to a non-existent tensilelibrary.dat. In which case, the above change may not fix anything.

happysalada commented 1 year ago

Hey thanks a lot !

I tried what they recommended on the issue and it didnt work. They also mentioned that likely my gpu wouldnt be supported. Ill keep an eye out for when it is. Thanks for your great work on this !

CyberShadow commented 1 year ago

If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: Madouura@8f667a0

Hi!

mustafasegf commented 1 year ago

hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?

Madouura commented 1 year ago

hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?

There's some issues upgrading that I don't have time for at the moment, but here's a patch to get anyone who needs it fixed now started. rocm.patch.txt

mustafasegf commented 1 year ago

hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?

There's some issues upgrading that I don't have time for at the moment, but here's a patch to get anyone who needs it fixed now started. rocm.patch.txt

Thank you very much for the patch and for the work you put in rocm suopport! I hope you can fix the issue soon regarding the upgrade!

Madouura commented 1 year ago

If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: Madouura@8f667a0

Hi!

* This commit doesn't seem to be on any branch, is that right? Somehow I managed to clone it anyway.

* I see this uses [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/) and not [ROCmSoftwarePlatform/tensorflow-upstream](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream). Is that right? I heard AMD was upstreaming ROCm support into TF, but looking at it I see it's still active and there are many commits that are not in the upstream repo.

It would be preferable to use the standard tensorflow. That said if it doesn't work even after the update (it's been half a year) I'm working on, I think I'll make a variant in rocmPackages.

kurnevsky commented 1 year ago

Hi. Thanks for maintaining rocm for nix!

When I try to use torchWithRocm I got the following error:

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] hip runtime failed to load.
Error: Please provide architecture for which code is to be generated.
MIOpen Error: /build/source/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp

Any idea what should be in the environment? I tried adding recent meta.rocm-all but it didn't help.

Madouura commented 1 year ago

Looks like it's a nix derivation. Could you give me it so I can replicate this? From a cursory look, either hip is actually not being found or the GPU target is not being specified. Also please list your GPU(s).

kurnevsky commented 1 year ago

Looks like it's a nix derivation. Could you give me it so I can replicate this?

It's just my nix configs where I added torchWithRocm to the environment packages.

From a cursory look, either hip is actually not being found or the GPU target is not being specified.

I tried to specify target like this but as I understand this target is enabled by default now. Anyway the same result with and without it.

Also please list your GPU(s).

It's 7900 XTX.

Madouura commented 1 year ago

Unless I'm seeing this wrong or not looking enough, at https://github.com/kurnevsky/nixfiles/blob/bcbd6f98d40bb0bd2e11fb7aae4ff547f01b8f26/modules/desktop.nix#L259C13-L259C13: You're not using torchWithRocm, just pytorch. Try this: https://github.com/NixOS/nixpkgs/issues/197885#issuecomment-1419995566

kurnevsky commented 1 year ago

Yeah, I just didn't commit the change to torchWithRocm since it doesn't work :)

Try this: https://github.com/NixOS/nixpkgs/issues/197885#issuecomment-1419995566

It works fine:

CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.7.31921- (Should contain "5.7")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 7900 XTX (Should be AMD, not NVIDIA)
mul_sum(x, x):   92.7 us
mul_sum(x, x):    5.2 us
bmm(x, x):      430.9 us
bmm(x, x):        8.7 us

But I still have the error when trying to use diffusers or even when trying to run my custom network (I vaguely remember I saw somebody mentioned the same error when trying to use conv2d layer specifically, EDIT: found it: https://discuss.pytorch.org/t/error-while-using-conv2d/180807 - the example from there doesn't work for me either, but it just segfaults my python EDIT2: actually it produces the same error if HSA_OVERRIDE_GFX_VERSION is removed).

Madouura commented 1 year ago

I'm not entirely sure how I can help. Is this an issue with whatever you're trying to use torchWithRocm with or with torchWithRocm itself?

kurnevsky commented 1 year ago

With torchWithRocm itself. Could you try running this script:

import torch as trc
gpu = trc.device("cuda")

ex1 = trc.zeros(1, 1, 5, 5)
ex1[0, 0, :, 2] = 1

conv1 = trc.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)
res1 = conv1(ex1)
print(res1)  #shows result

ex2 = ex1.to(gpu)
conv2 = conv1.to(gpu)

res2 = conv2(ex2) #error here
print(res2)

Does it work for you? For me it fails with the error I mentioned.

Madouura commented 1 year ago

Took me a bit to compile torchWithRocm. I need to fix openai-triton to be free again. Anyway, the test worked just fine for me. nix-shell -I nixpkgs=/home/mado/Documents/Development/nixpkgs -p python3Packages.torchWithRocm

❯ python test.py
tensor([[[[ 0.1783, -0.3823, -0.0870],
          [ 0.1783, -0.3823, -0.0870],
          [ 0.1783, -0.3823, -0.0870]]]], grad_fn=<ConvolutionBackward0>)
tensor([[[[ 0.1783, -0.3823, -0.0870],
          [ 0.1783, -0.3823, -0.0870],
          [ 0.1783, -0.3823, -0.0870]]]], device='cuda:0',
       grad_fn=<ConvolutionBackward0>)
kurnevsky commented 1 year ago

ok, thanks. I assume you use a different GPU? Maybe it's a problem specifically with 7900 XTX...

Madouura commented 1 year ago

It's possible your GPU may not be fully supported yet. I believe your GPU is GFX11? I wonder if that's why.