ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.88k stars 9.61k forks source link

Is there any support for AMD GPU (ROCM) #2540

Closed mdrokz closed 1 year ago

mdrokz commented 1 year ago

Hi i was wondering if there is any support for using llama.cpp with AMD GPU is there a ROCM implementation ?

MichaelDays commented 1 year ago

There’s a ROCm branch that hasn’t been merged yet, but is being maintained by the author.

https://github.com/ggerganov/llama.cpp/pull/1087/commits

aiaicode commented 1 year ago

Have you tried this? : https://github.com/ggerganov/llama.cpp#clblast

To check if you have CUDA support via ROCm, do the following :

$ python import torch torch.cuda.is_available() Output : True or False

If it's True then you have the right ROCm and Pytorch installed and things should work. At least for Stable diffusion that's how you check and make it work.

If it's False then you need to check if your GPU has CUDA support or not. You can see the AMD OpenCL supported devices list here : https://en.wikipedia.org/wiki/OpenCL#Devices

mdrokz commented 1 year ago

There’s a ROCm branch that hasn’t been merged yet, but is being maintained by the author.

https://github.com/ggerganov/llama.cpp/pull/1087/commits

oh any idea why it hasnt been merged yet ?

mdrokz commented 1 year ago

Have you tried this? : https://github.com/ggerganov/llama.cpp#clblast

To check if you have CUDA support via ROCm, do the following :

$ python import torch torch.cuda.is_available() Output : True or False

If it's True then you have the right ROCm and Pytorch installed and things should work. At least for Stable diffusion that's how you check and make it work.

If it's False then you need to check if your GPU has CUDA support or not. You can see the AMD OpenCL supported devices list here : https://en.wikipedia.org/wiki/OpenCL#Devices

llama.cpp doesnt use torch as its a custom implementation so that wont work and stable diffusion uses torch by default and torch supports rocm.

SlyEcho commented 1 year ago

oh any idea why it hasnt been merged yet ?

Soon, I just haven't had time recently to work on it.

ghost commented 1 year ago

oh any idea why it hasnt been merged yet ?

Soon, I just haven't had time recently to work on it.

Which work needs to be done (except for Windows support, maybe)? Your version is working fine here with my 6650 XT and Linux.

SlyEcho commented 1 year ago

I want to add some CI checks to see if it compiles so if the CUDA code is updated it would not break (well, at least not break the build).

Then some small tweaks, like having the UI say "ROCm" or "HIP" instead of "CUDA".

shibe2 commented 1 year ago

AMD GPUs are supported through CLBlast. In my experience, ROCm is much more problematic than OpenCL. I recommend going with CLBlast, unless you get better performance with another option or for some specific reason.

mdrokz commented 1 year ago

AMD GPUs are supported through CLBlast. In my experience, ROCm is much more problematic than OpenCL. I recommend going with CLBlast, unless you get better performance with another option or for some specific reason.

Oh wow i didnt know that i will try CL blast. do you know the performance difference between CLBlast and ROCM ?

ghost commented 1 year ago

For me, ROCm is much faster compared to CLBlast. And I don't see any reasons to not use ROCm (at least when we speak about Linux, ROCm for Windows is still really new). IF the hardware/os is supported, which is the only downside right now. There is a comparison between ROCm and CLBlast here, but I think it is a bit outdated:

https://github.com/YellowRoseCx/koboldcpp-rocm/

It's for koboldcpp, but this uses llama.cpp.

SlyEcho commented 1 year ago

The prompt evaluation is much faster with ROCm. If this could be optimized, I'm sure they could be similar. The only downside to OpenCL is that the memory management is not as advanced as it is in ROCm/CUDA.

I would keep an eye on the Vulkan version in #2059, it has a lot of promise to support much wider set of devices.

mdrokz commented 1 year ago

For me, ROCm is much faster compared to CLBlast. And I don't see any reasons to not use ROCm (at least when we speak about Linux, ROCm for Windows is still really new). IF the hardware/os is supported, which is the only downside right now. There is a comparison between ROCm and CLBlast here, but I think it is a bit outdated:

https://github.com/YellowRoseCx/koboldcpp-rocm/

It's for koboldcpp, but this uses llama.cpp.

which GPU did you use ? i cant get CLblast to run on my RX 6700 XT, also is opencl through rocm supported ?

shibe2 commented 1 year ago

Although OpenCL and ROCm are different APIs, OpenCL driver for Radeon RX 6xxx is based on ROCm code (see AMD CLR). CLBlast supports Radeon RX 6700 XT out of the box with the default driver on Linux.

@mdrokz You need to make sure that OpenCL is working properly on your system. Try clinfo and other software that uses OpenCL. Also make sure that llama.cpp is compiled with CLBlast.

SlyEcho commented 1 year ago

If you use Linux you have to install AMD's OpenCL platform, the open source Mesa project has two OpenCL platforms: the old Clover that may work or may crash your whole PC, then the new Rusticl, which is probably the future, but it is not as fast right now.

It should be working in Windows, there are CLBlast binaries available on our releases page.

efschu commented 1 year ago

How to split over multiple rocm devices? If so, possible to mix AMD and NVIDIA cards then?

mdrokz commented 1 year ago

Although OpenCL and ROCm are different APIs, OpenCL driver for Radeon RX 6xxx is based on ROCm code (see AMD CLR). CLBlast supports Radeon RX 6700 XT out of the box with the default driver on Linux.

@mdrokz You need to make sure that OpenCL is working properly on your system. Try clinfo and other software that uses OpenCL. Also make sure that llama.cpp is compiled with CLBlast.

Well i use the open source mesa drivers so i installed mesa-opencl package clinfo works but when i run the program i get this error

ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon RX 6700 XT (navi22, LLVM 15.0.7, DRM 3.52, 6.3.4-201.fsync.fc37.x86_64)'
ggml_opencl: device FP16 support: false
ggml_opencl: kernel compile error:

fatal error: cannot open file '/usr/lib64/clc/gfx1031-amdgcn-mesa-mesa3d.bc': No such file or directory
mdrokz commented 1 year ago

If you use Linux you have to install AMD's OpenCL platform, the open source Mesa project has two OpenCL platforms: the old Clover that may work or may crash your whole PC, then the new Rusticl, which is probably the future, but it is not as fast right now.

It should be working in Windows, there are CLBlast binaries available on our releases page.

Yeah i realized i have to use the AMDGPU driver to run it i found a docker image for that i will try running it through that because i use the open source mesa drivers and also i dont use windows :(

shibe2 commented 1 year ago

@mdrokz You can compile or find packages for open source ROCm-based OpenCL driver. Where did you get ROCm from?

SlyEcho commented 1 year ago

How to split over multiple rocm devices? If so, possible to mix AMD and NVIDIA cards then?

Absolute herecy.

Jokes aside, it wouldn't work like that, they are using completely different architectures and compiled separately. There are some libraries like Orochi that promise to do this, but we are right now using CUDA for Nvidia and HIP for AMD.

Maybe the MPI stuff would be useful in this case?


Well i use the open source mesa drivers so i installed mesa-opencl package clinfo works but when i run the program i get this error

It depends on the distro, for Arch, there are packages for the AMD OpenCL driver. On others, maybe AMD's installer can help you.

It's possible to have multiple platforms installed, clinfo will show them all. llama.cpp has environment flags to help the program choose the right one if it happens to load something that is not desired.

mdrokz commented 1 year ago

@mdrokz You can compile or find packages for open source ROCm-based OpenCL driver. Where did you get ROCm from?

im using fedora 37 i got rocm from here http://repo.radeon.com/rocm/yum/5.2.3/main/ and i installed rocm-opencl

efschu commented 1 year ago

How to split over multiple rocm devices? If so, possible to mix AMD and NVIDIA cards then?

Absolute herecy.

OK, no split between NVIDIA and AMD.

But is it possible to use multiple ROCM devices like I can with CUDA?

Loading models in ocl consume less VRAM then loading them in CUDA. Problem is, big models still need multiple GPUs.

SlyEcho commented 1 year ago

Should be possible to use multiple AMD cards, but I haven't tested it myself.

The OpenCL code uses a bit different memory management, and it seems to be more efficient, but this is a known issue.

shibe2 commented 1 year ago

@mdrokz I know that version 5.6 works. Sometimes it may be necessary to set some environment variables to enable/disable OpenCL drivers, for example, OCL_ICD_VENDORS. clinfo should have "AMD-APP" in Platform Version and "HSA" in Driver Version. If clinfo shows multiple devices, you can use GGML_OPENCL_PLATFORM to select the correct driver.

efschu commented 1 year ago

Should be possible to use multiple AMD cards, but I haven't tested it myself.

The OpenCL code uses a bit different memory management, and it seems to be more efficient, but this is a known issue.

Well, how do I split across multiple ROCm devices?

The way I do it with CUDA is not working, loading only on first card.

arch-btw commented 1 year ago

@mdrokz do you have libclc installed?

I'm not familiar with how to use yum, but here's the project's website: https://libclc.llvm.org

Maybe this more specifically: https://packages.fedoraproject.org/pkgs/libclc/libclc/

mdrokz commented 1 year ago

@mdrokz I know that version 5.6 works. Sometimes it may be necessary to set some environment variables to enable/disable OpenCL drivers, for example, OCL_ICD_VENDORS. clinfo should have "AMD-APP" in Platform Version and "HSA" in Driver Version. If clinfo shows multiple devices, you can use GGML_OPENCL_PLATFORM to select the correct driver.

I dont have much idea about that can you check this gist https://gist.github.com/mdrokz/303ca842dcf63df733b3ab27b6f1dd14 Which platform should i use ? its showing 3 currently

mdrokz commented 1 year ago

@mdrokz do you have libclc installed?

I'm not familiar with how to use yum, but here's the project's website: https://libclc.llvm.org

Maybe this more specifically: https://packages.fedoraproject.org/pkgs/libclc/libclc/

Yes i have libclc installed

mdrokz commented 1 year ago

I ended up using the amdgpu driver in a docker container like this https://github.com/mdrokz/rust-llama.cpp/blob/implement_blas_support/examples/opencl/Dockerfile this dockerfile works on my GPU RX 6700 XT