Open Madouura opened 2 years ago
Updating to 5.3.1, marking all WIP until pushed to their respective PRs and verified.
If anyone is interested in helping me debug rocBLAS, here's the current derivation
Already fixed.
Hi, thanks a lot for your work on ROCm packages!
So far, the updates where all aggregated in a single rocm: 5.a.b -> 5.x.y
pr. I think that makes more sense than splitting the package updates into single prs for a couple of reasons:
tl;dr, do you mind merging all your 5.3.1 updates into a single PR?
PS: Not sure how you did the update, I usually do it with for f in rocm-smi rocm-cmake rocm-thunk rocm-runtime rocm-opencl-runtime rocm-device-libs rocm-comgr rocclr rocminfo llvmPackages_rocm.llvm hip; nix-shell maintainers/scripts/update.nix --argstr commit true --argstr package $f; end
.
I was actually afraid of the opposite being true so I split them up. Got it, I'll aggregate them. Thanks for the tip on the update script, that would have saved me a lot of time.
Hip I think should stay separate though, since there are other changes. Actually never mind it's just an extra dependency so should be fine to split it.
Done. #198770
Hold off on merging while I investigate just disabling BUILD_FILE_REORG_BACKWARD_COMPATIBILITY
in this and other packages.
Updated all relevant PRs to note why we are manually defining CMAKE_INSTALL_<DIR>
.
@acowley, @lovesegfault, and @Flakebi, could you please look at https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1277?
Pinging since you're the maintainers of
rocm-llvm
.I think we may need to adjust
Already fixed.rocm-llvm
in some way.
@lovesegfault Mass change to rocmVersion
done.
Naturally, all the relevant PRs are broken until #198770 is merged, if testing please make sure to cherry-pick or rebase on that PR.
Rolled everything into #198770
May want to change rec
to finalAttrs
, mainly since we do a cross-derivation check to determine if the derivation is broken.
@Flakebi Once #202373 is merged, would you mind standardizing the older rocm derivations to the new ones? Assuming @lovesegfault and @acowley are okay with it, of course.
Of course!
rocRAND already includes hipRAND and hipRAND has no releases or tags so until further notice, no separate hipRAND derivation.
If anyone is interested in helping me debug the roctracer
derivation, see: https://github.com/Madouura/nixpkgs/blob/05d79d8d63c96f06cf53a00e6e83ab2354fe2bce/pkgs/development/libraries/roctracer/default.nix
Relevant issues:
- https://github.com/ROCm-Developer-Tools/roctracer/issues/82
- https://github.com/ROCm-Developer-Tools/roctracer/issues/83
Already fixed.
Last few packages with a ROCm version are gonna be put into a PR soon.
After that, going to work on implementing any missed documention, impureTests
, and -DAMDGPU_TARGETS
.
Then, after that, one final possible rocm-related (not anything like lit
this time) rebuild with cleanup/optimization/better patchelf for all the rocm packages with everything I've learned.
Then, finally, pytorch and tensorflow.
If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: https://github.com/Madouura/nixpkgs/commit/8f667a0df2a7c29a8fb4323d1524b3314cbd9068
Don't have time to help, it's really cool that you're working on this and I'll probably benefit from it at some point in the future if it gets in! Thanks for putting in all the effort here.
Should we make a rocmPackages
attrset similar to cudaPackages
?
I'm already looking at mass changes with using strictDeps
and it would be nice to include llvmPackages_rocm.<derivation>
in it since the llvm package for rocm is completely custom.
It might also be nice since we could separate major changes (i.e.: 5.x.x -> 6.x.x) and keep both.
It would also allow us to group different packages into other attrsets to emulate the common rocm layout. (i.e.: rocm-developer-tools, hip-developer-tools, etc)
cc @flakebi @lovesegfault @acowley
A package set for rocm sounds good to me. Thanks a lot for all of your work on ROCm packages!
@Madouura thanks a lot for your work on this! I was looking at testing rocm support for torch with openai-whisper. I've added the following package
(openai-whisper.overrideAttrs (finalAttrs: previousAttrs: {
torch = python3Packages.torchWithRocm;
}))
however when I try to run it whisper cougar_town.mp3 --model large-v2 --device 'cuda'
then I get
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
when I type rocminfo, It does show me a gpu though. If there is anything that can be tried, I'd be happy to test stuff.
(using cuda as device is coming from the guide here
@Madouura thanks a lot for your work on this! I was looking at testing rocm support for torch with openai-whisper. I've added the following package
(openai-whisper.overrideAttrs (finalAttrs: previousAttrs: { torch = python3Packages.torchWithRocm; }))
however when I try to run it
whisper cougar_town.mp3 --model large-v2 --device 'cuda'
then I getRuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
when I type rocminfo, It does show me a gpu though. If there is anything that can be tried, I'd be happy to test stuff.
(using cuda as device is coming from the guide here
Strange, I can't find any reason this would be happening in the package's code.
Try running this with torchWithRocm and paste the output so I can make sure it's not on your side.
nix-shell -I nixpkgs=${nixpkgs-unstable} -p python3Packages.torchWithRocm
benchmark.py
import torch, timeit
print(f"CUDA support: {torch.cuda.is_available()} (Should be \"True\")")
print(f"CUDA version: {torch.version.cuda} (Should be \"None\")")
print(f"HIP version: {torch.version.hip} (Should contain \"5.7\")")
# Storing ID of current CUDA device
cuda_id = torch.cuda.current_device()
print(f"Current CUDA device ID: {torch.cuda.current_device()}")
print(f"Current CUDA device name: {torch.cuda.get_device_name(cuda_id)} (Should be AMD, not NVIDIA)")
def batched_dot_mul_sum(a, b):
'''Computes batched dot by multiplying and summing'''
return a.mul(b).sum(-1)
def batched_dot_bmm(a, b):
'''Computes batched dot by reducing to bmm'''
a = a.reshape(-1, 1, a.shape[-1])
b = b.reshape(-1, b.shape[-1], 1)
return torch.bmm(a, b).flatten(-3)
x = torch.randn(10000, 1024, device='cuda')
t0 = timeit.Timer(
stmt='batched_dot_mul_sum(x, x)',
setup='from __main__ import batched_dot_mul_sum',
globals={'x': x})
t1 = timeit.Timer(
stmt='batched_dot_bmm(x, x)',
setup='from __main__ import batched_dot_bmm',
globals={'x': x})
# Ran each twice to show difference before/after warmup
print(f'mul_sum(x, x): {t0.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'mul_sum(x, x): {t0.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'bmm(x, x): {t1.timeit(100) / 100 * 1e6:>5.1f} us')
print(f'bmm(x, x): {t1.timeit(100) / 100 * 1e6:>5.1f} us')
https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/transformers/default.nix
This may also need to be changed to torchWithRocm
.
Tensorflow hasn't been ROCmified yet in nixpkgs so that could also be an upcoming issue.
Thanks a lot for the help, I got rid of the runtime error with your help. I needed to use override and not override attrs, so for anyone else I needed
(let torchWithRocm = python3Packages.torchWithRocm;
in
openai-whisper.override {
torch = torchWithRocm;
transformers = python3Packages.transformers.override {
torch = torchWithRocm;
};
})
However, now when I run the program, It seems to get stuck.
I was looking at the gputargets inside hip, and indeed mine is missing (I've got gfx90c) Do you think it would be okay to add it ? https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/compilers/hip/default.nix#L166 The comment looks old, and looking at the supported GPUs, all the gfx9 should be included https://community.amd.com/t5/knowledge-base/amd-rocm-hardware-and-software-support-document/ta-p/489937 Is there anything to watch out for ? Let me know if you think I've missed something.
This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]
It does fail with missing TensileLibrary.dat
in rocBLAS, tho:
$ python benchmark.py
CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.4.22803-0 (Should contain "5.4")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA)
mul_sum(x, x): 84.2 us
mul_sum(x, x): 10.9 us
rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)
I used this flake.nix
to create a shell:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
};
outputs = { self, nixpkgs }:
let pkgs = nixpkgs.legacyPackages."x86_64-linux";
in {
devShells.x86_64-linux.default = pkgs.mkShell {
buildInputs = [
pkgs.python3Packages.torchWithRocm
pkgs.rocm-smi
pkgs.nvtop-amd
];
};
};
}
How do I get the .dat
files?
Thanks again!
Thanks a lot for the help, I got rid of the runtime error with your help. I needed to use override and not override attrs, so for anyone else I needed
(let torchWithRocm = python3Packages.torchWithRocm; in openai-whisper.override { torch = torchWithRocm; transformers = python3Packages.transformers.override { torch = torchWithRocm; }; })
However, now when I run the program, It seems to get stuck.
I was looking at the gputargets inside hip, and indeed mine is missing (I've got gfx90c) Do you think it would be okay to add it ? https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/compilers/hip/default.nix#L166 The comment looks old, and looking at the supported GPUs, all the gfx9 should be included https://community.amd.com/t5/knowledge-base/amd-rocm-hardware-and-software-support-document/ta-p/489937 Is there anything to watch out for ? Let me know if you think I've missed something.
gfx90c
likely isn't supported, but go ahead and try adding it to the gpuTargets
and see if it works.
This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]
It does fail with missing
TensileLibrary.dat
in rocBLAS, tho:$ python benchmark.py CUDA support: True (Should be "True") CUDA version: None (Should be "None") HIP version: 5.4.22803-0 (Should contain "5.4") Current CUDA device ID: 0 Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA) mul_sum(x, x): 84.2 us mul_sum(x, x): 10.9 us rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory Aborted (core dumped)
I used this
flake.nix
to create a shell:{ inputs = { nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable"; }; outputs = { self, nixpkgs }: let pkgs = nixpkgs.legacyPackages."x86_64-linux"; in { devShells.x86_64-linux.default = pkgs.mkShell { buildInputs = [ pkgs.python3Packages.torchWithRocm pkgs.rocm-smi pkgs.nvtop-amd ]; }; }; }
How do I get the
.dat
files?Thanks again!
IIRC the RX 5500 XT isn't supported on rocBLAS
, you're out of luck until it is.
Make an issue on the rocBLAS
and see if they can help. Sorry!
See https://github.com/RadeonOpenCompute/ROCm/blob/77cbac4abab13046ee93d8b5bf410684caf91145/README.md#library-target-matrix for what is definitely supported.
Okay nevermind, 5500 is supported on rocBLAS. I'll look into it, still make that issue if you can please.
@luizirber Did you build rocBLAS yourself or did you fetch from the nixos cache? The tensile stuff should be generated regardless and IIRC is working on hydra, so this shouldn't be happening.
@happysalada This may be of some use to you. https://github.com/RadeonOpenCompute/ROCm/issues/1743
@luizirber Try setting these both to false: https://github.com/NixOS/nixpkgs/blob/6e30230a696728c3b14a477911293e257de4873d/pkgs/development/libraries/rocblas/default.nix#L22-L23 This still shouldn't be happening on the benchmark, but whatever it is on your side this should fix it. (Note: This will take ages)
This is amazing, thanks for all the hard work @Madouura! I've been watching this issue for a long time because I never managed to use ROCm with my (unsupported?) RX 5500 XT, but I can at least start the benchmark now =]
It does fail with missing
TensileLibrary.dat
in rocBLAS, tho:$ python benchmark.py CUDA support: True (Should be "True") CUDA version: None (Should be "None") HIP version: 5.4.22803-0 (Should contain "5.4") Current CUDA device ID: 0 Current CUDA device name: AMD Radeon RX 5500 XT (Should be AMD, not NVIDIA) mul_sum(x, x): 84.2 us mul_sum(x, x): 10.9 us rocBLAS error: Cannot read /nix/store/923xk8qd08k8vyyy98zn91p0dw627xcq-rocblas-5.4.2/lib/rocblas/library/TensileLibrary.dat: No such file or directory Aborted (core dumped)
I used this
flake.nix
to create a shell:{ inputs = { nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable"; }; outputs = { self, nixpkgs }: let pkgs = nixpkgs.legacyPackages."x86_64-linux"; in { devShells.x86_64-linux.default = pkgs.mkShell { buildInputs = [ pkgs.python3Packages.torchWithRocm pkgs.rocm-smi pkgs.nvtop-amd ]; }; }; }
How do I get the
.dat
files?Thanks again!
I should mention I did exactly that setup just now and am still not getting any rocBLAS errors. I wonder if it's your setup or there's some switch toggled if the GPU is different? It could be that the 5500 xt's tensile stuff isn't built and it's defaulting to a non-existent tensilelibrary.dat. In which case, the above change may not fix anything.
Hey thanks a lot !
I tried what they recommended on the issue and it didnt work. They also mentioned that likely my gpu wouldnt be supported. Ill keep an eye out for when it is. Thanks for your great work on this !
If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: Madouura@8f667a0
Hi!
hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?
hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?
There's some issues upgrading that I don't have time for at the moment, but here's a patch to get anyone who needs it fixed now started. rocm.patch.txt
hi there, rocm 5.6 released couple days ago. Can we upgrade to 5.6 or at least 5.5 to suport rdna 3 graphics card?
There's some issues upgrading that I don't have time for at the moment, but here's a patch to get anyone who needs it fixed now started. rocm.patch.txt
Thank you very much for the patch and for the work you put in rocm suopport! I hope you can fix the issue soon regarding the upgrade!
If anyone is interested in helping me add rocm support to tensorflow, here is the current WIP: Madouura@8f667a0
Hi!
* This commit doesn't seem to be on any branch, is that right? Somehow I managed to clone it anyway. * I see this uses [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/) and not [ROCmSoftwarePlatform/tensorflow-upstream](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream). Is that right? I heard AMD was upstreaming ROCm support into TF, but looking at it I see it's still active and there are many commits that are not in the upstream repo.
It would be preferable to use the standard tensorflow
. That said if it doesn't work even after the update (it's been half a year) I'm working on, I think I'll make a variant in rocmPackages
.
Hi. Thanks for maintaining rocm for nix!
When I try to use torchWithRocm
I got the following error:
MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] hip runtime failed to load.
Error: Please provide architecture for which code is to be generated.
MIOpen Error: /build/source/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp
Any idea what should be in the environment? I tried adding recent meta.rocm-all
but it didn't help.
Looks like it's a nix derivation. Could you give me it so I can replicate this? From a cursory look, either hip is actually not being found or the GPU target is not being specified. Also please list your GPU(s).
Looks like it's a nix derivation. Could you give me it so I can replicate this?
It's just my nix configs where I added torchWithRocm
to the environment packages.
From a cursory look, either hip is actually not being found or the GPU target is not being specified.
I tried to specify target like this but as I understand this target is enabled by default now. Anyway the same result with and without it.
Also please list your GPU(s).
It's 7900 XTX
.
Unless I'm seeing this wrong or not looking enough, at https://github.com/kurnevsky/nixfiles/blob/bcbd6f98d40bb0bd2e11fb7aae4ff547f01b8f26/modules/desktop.nix#L259C13-L259C13: You're not using torchWithRocm, just pytorch. Try this: https://github.com/NixOS/nixpkgs/issues/197885#issuecomment-1419995566
Yeah, I just didn't commit the change to torchWithRocm
since it doesn't work :)
Try this: https://github.com/NixOS/nixpkgs/issues/197885#issuecomment-1419995566
It works fine:
CUDA support: True (Should be "True")
CUDA version: None (Should be "None")
HIP version: 5.7.31921- (Should contain "5.7")
Current CUDA device ID: 0
Current CUDA device name: AMD Radeon RX 7900 XTX (Should be AMD, not NVIDIA)
mul_sum(x, x): 92.7 us
mul_sum(x, x): 5.2 us
bmm(x, x): 430.9 us
bmm(x, x): 8.7 us
But I still have the error when trying to use diffusers or even when trying to run my custom network (I vaguely remember I saw somebody mentioned the same error when trying to use conv2d layer specifically, EDIT: found it: https://discuss.pytorch.org/t/error-while-using-conv2d/180807 - the example from there doesn't work for me either, but it just segfaults my python EDIT2: actually it produces the same error if HSA_OVERRIDE_GFX_VERSION is removed).
I'm not entirely sure how I can help.
Is this an issue with whatever you're trying to use torchWithRocm
with or with torchWithRocm
itself?
With torchWithRocm
itself. Could you try running this script:
import torch as trc
gpu = trc.device("cuda")
ex1 = trc.zeros(1, 1, 5, 5)
ex1[0, 0, :, 2] = 1
conv1 = trc.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)
res1 = conv1(ex1)
print(res1) #shows result
ex2 = ex1.to(gpu)
conv2 = conv1.to(gpu)
res2 = conv2(ex2) #error here
print(res2)
Does it work for you? For me it fails with the error I mentioned.
Took me a bit to compile torchWithRocm
. I need to fix openai-triton
to be free again.
Anyway, the test worked just fine for me.
nix-shell -I nixpkgs=/home/mado/Documents/Development/nixpkgs -p python3Packages.torchWithRocm
❯ python test.py
tensor([[[[ 0.1783, -0.3823, -0.0870],
[ 0.1783, -0.3823, -0.0870],
[ 0.1783, -0.3823, -0.0870]]]], grad_fn=<ConvolutionBackward0>)
tensor([[[[ 0.1783, -0.3823, -0.0870],
[ 0.1783, -0.3823, -0.0870],
[ 0.1783, -0.3823, -0.0870]]]], device='cuda:0',
grad_fn=<ConvolutionBackward0>)
ok, thanks. I assume you use a different GPU? Maybe it's a problem specifically with 7900 XTX
...
It's possible your GPU may not be fully supported yet. I believe your GPU is GFX11? I wonder if that's why.
Tracking issue for ROCm derivations.
Key
WIP
-
Ready
-
TODO
Merged
ROCm-related
261155
263048
Notes
nix-shell maintainers/scripts/update.nix --argstr commit true --argstr keep-going true --arg predicate '(path: pkg: builtins.elem (pkg.pname or null) [ "rocm-llvm-llvm" "rocm-core" "rocm-cmake" "rocm-thunk" "rocm-smi" "rocm-device-libs" "rocm-runtime" "rocm-comgr" "rocminfo" "clang-ocl" "rdc" "rocm-docs-core" "hip-common" "hipcc" "clr" "hipify" "rocprofiler" "roctracer" "rocgdb" "rocdbgapi" "rocr-debug-agent" "rocprim" "rocsparse" "rocthrust" "rocrand" "rocfft" "rccl" "hipcub" "hipsparse" "hipfort" "hipfft" "tensile" "rocblas" "rocsolver" "rocwmma" "rocalution" "rocmlir" "hipsolver" "hipblas" "miopengemm" "composable_kernel" "half" "miopen" "migraphx" "rpp-hip" "mivisionx-hip" "hsa-amd-aqlprofile-bin" ])'
Won't implement
strictDeps
for all derivations