Atinoda / text-generation-webui-docker

Docker variants of oobabooga's text-generation-webui, including pre-built images.
GNU Affero General Public License v3.0
389 stars 74 forks source link

AMD / ROCM support #27

Open panayotovip opened 11 months ago

panayotovip commented 11 months ago

is there a way to run with AMD GPU ?

Atinoda commented 11 months ago

It should be possible to use AMD to a degree (there is some AMD support in the upstream software), but I do not currently have any suitable AMD hardware to develop on for this project. I plan to offer support at some point in the future though! Bear in mind, ROCM will lag CUDA because Nvidia spent a lot of years and effort capturing the ML market...

Atinoda commented 10 months ago

In a fit of enthusiasm, I procured a Radeon W5500X to build and test on. Unfortunately, I did not appreciate that ROCM != CUDA. AMD support almost no cards for compute, especially on the consumer side. They appear to only really be interested in super computer customers. I've burnt over 12 hours trying to hack ROCM to compile and run ML workloads properly - despite some success, I am not happy with the stability, performance, or compatibility.

I have decided that there will be no support outside of ROCM's officially supported cards - I will not make up for AMD's woeful support of their own hardware. I am considering purchasing a ROCM supported card but it is now de-prioritised due to the cost - and the additional PSU watts required!

Please recognise the excellent work of:

The following resources are useful for ROCM support information

Atinoda commented 10 months ago

Issue was closed accidentally! Please post any AMD thoughts here.

My current advice to AMD users is to try out LLM on CPU (with patience) and if you like it, then buy a Nvidia card with as much VRAM as you can afford! The exception is if your card is actually supported by ROCM - then there's a bit of a chance to run models without too much hassle...

serhii-nakon commented 10 months ago

@Atinoda Hello. Thank you for mentioning me. Technically AMD support more cards than described on their official site...

Here list of all cards (GFX codes) that they support (it can be regular RX and professional WX card due the same chips or very similar one - for example they support RX7900 and WX7900) https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-56ea92985c9445fe524fcb2a6b657994599a5b81b69ed36559c3eb2724cee864?context=explore

Screenshot_2023-11-11-01-04-06-34_3aea4af51f236e4932235fdada7d1643.jpg

I remember that some RX/WX6000 should works too.

Atinoda commented 7 months ago

I have implemented ROCM support in line with the upstream project. At this stage, it is untested because I do not have hardware to run it. Please give it a go and see if it works! Reports are welcomed.

Alkali-V2 commented 7 months ago

So I am actually between two of your open issues: RoCM support for an AMD GPU and unRAID container that another person was working on. The good news is that the Docker container works! No issue, only error I saw was a complaint about numpy being 1.22 instead of 1.24 but it didn't slow me down. As for the testing, it seems I cannot get the docker container to see my GPU. In the logs it loads in as CPU Extended which I think might be a fallback. However, to prove or disprove that here are my edits to the docker-compose.yml

image

I did modify line 4 to align with default-rocm as seen above and commented out the Nvidia deployment steps at the bottom. I did also modify the "target" in docker-compose.build.yml as seen below to 'rocm' on line 6.

image

Outside of that I followed an additional guide here to fill out my settings for the unRAID template for launching it with other containers: https://github.com/oobabooga/text-generation-webui/issues/4850

image

So with all of that taken care of and the chat both opening and loading I noticed per the logs that I was starting in CPU extended mode despite calling for RoCM in the docker-compose.

image

I validated that it was not in use by unRAID in the PCI settings and that the kernel level driver is loading amdgpu the same as my working Arch system. Did I miss something or edit the compose files incorrectly?

Unraid: 6.12.18 CPU: AMD 3900 XT GPU: Radeon RX 6800 using AMDGPU linux driver RAM: 32GB

Edit: One additional thing I wanted to mention is that in the oobabooga's git for text-generation-webui, I did have to edit a file additionally for it to recognize my GPU in the system. I had to modify "one_click.py" which I assume is for initial installation of the system on lines 16-18 filling out my specific GPU information--in this case my 7900 XTX is gfx1100. I would need to look up what the RX6800 in my system is equal to.

image

Atinoda commented 7 months ago

Hi @Alkali-V2 - thanks very much for testing this with your AMD GPU, and for your detailed post! I would be happy to work with you and try to get it up and running on your hardware.

I've read through and can I please confirm a couple of things with you?

  1. Are you building or pulling the image?
  2. Are you running the container via Unraid?
  3. Have you successfully accelerated text-generation-webui before on bare-metal with a 7900XTX?

To get to accelerated inference we need two things: 1) the container using supported libraries, and 2) grant the container access to the GPU hardware.

Regarding Step 1, the message that you are seeing for the CPU Extended version definitely means that it is not using (or building) the ROCM image. I think that might be due to specifying rocm as the target in the docker-compose.build.yml. The target should be default-rocm - but this only applies if you are building the software! Hopefully you can use the pre-built image instead, to save time on set up.

For Step 2, unfortunately ROCM is a bit more awkward to pass through to docker... I did hack my unsupported GPU to run ROCM workloads (then immediately segfault!) but I think it was correctly available to the container. Please try adding the following - which is the equivalent to the deploy section for nvidia - to your docker-compose.yml:

group_add:
  - video
ipc: host
devices:
  - /dev/kfd
  - /dev/dri 
cap_add: 
  - SYS_PTRACE
security_opt:
  - seccomp=unconfined

Note that there are some heavy duty permissions granted there... so just bear that in mind for the purposes of security. Good luck and please let me know how you get on!

Alkali-V2 commented 7 months ago

@Atinoda Thanks for getting back to me and for your work on this project! It greatly simplified this process and I know several Unraid forum users who agree. Edit I am fairly new to docker and didn't realize that compose.yml and compose.build.yml were both dependencies for building. I was following an unraid guide on how to add your own docker images: https://www.reddit.com/r/unRAID/comments/tm2hzn/how_to_install_a_dockerfile_not_using_docker_hub/ -- top voted comment mentioned building so I followed that. Which likely explains exactly why RoCM didn't compile correctly: my compose.build.yaml didn't say 'default-rocm' as you suggested.

However, one note for anyone who may also be trying this at home: the command 'docker compose up' will fail for compose being an invalid command due to Unraid not having the compose plugin installed by default. But there is a Docker Compose plugin and it is located here: https://forums.unraid.net/topic/114415-plugin-docker-compose-manager/page/19/ There were some concerns in the latest pages about its update recently so I am using build again for this test.

To answer your questions:

  1. I am building the image due to the issue above. If I can get compose working in the future I may go that route. I modified my docker-compose.build.yml to contain the changes you mentioned as seen below. I hope that was right, I'm going to find out soon...

image

  1. I am executing the container via Unraid. The screenshot of my settings and link above are what I am using for the template. One thing of note, for "Extra args" I did have to use the escape character from this issue here: https://github.com/Atinoda/text-generation-webui-docker/issues/25 to make it past the first argument. So my template args section looks like this image

  2. I have successfully bare-metal installed the 7900 XTX in my personal computer using RoCM and tested it extensively to ensure that it was within the range of results seen here: https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_radeon_7900_xtxtx_inference_performance/ -- I have reached about 90% of his performance and I can visibly see my GPU handling the work via system monitor in KDE.

I rebuilt just now with the changes to my docker-compose.build.yaml and I am still seeing CPU extended in the logs. Did I miss anything additional?

Alkali-V2 commented 7 months ago

Okay so I did some additional digging to see what might have been missing once I had the container running. Using 'docker exec -u 0 -it XXXXXX /bin/bash' I was able to install nano into the container and have a look around. In the one_click.py file inside the container I noted that the three items that have to be uncommented for AMD are still commented out image

This led me to check the /opt folder where on my bare metal machine RoCM is installed but I discovered that the files were not installed to that location and /opt is empty (I don't know what is sensitive information with docker so I just marked out the container ID) image

In my bare metal folder /opt/rocm looks like this: image

So I am suspect that ooba's system might be looking for RoCM in /opt and perhaps not able to find it there? You'll have to forgive me, I am making assumptions based on what little I do know about docker vs the bare metal install. If you did in fact point the container to a different location for the RoCM install and skipped "one_click.py" usage then this probably won't be very helpful. Just wanted to share what I found.

Atinoda commented 7 months ago

Thank you for answering my questions - that's especially good news that you've had acceleration running before. I am not familiar with Unraid, but a conversation with an LLM tells me that it should be possible to pull rather than build images.

I believe that the behaviour you are seeing with CPU Extended is because the build command is not using the specified target, so it just lands on the last variant in the Dockerfile - which is default-cpu. First thing to try is pull rather than build the container, using the default-rocm image. I think that this can be done by specifying the image:tag in the Unraid GUI as atinoda/text-generation-webui:default-rocm instead of atinoda/text-generation-webui.

However, if you want to build it, I think that you need to modify docker-compose.yml OR specify docker-compose.build.yml in the build command. This is because docker will always use the docker-compose.yml file by default - and this file does not specify the targets. The behaviour when building with that file is undefined... and I think that this may be what is resulting in you getting the CPU Extended variant so far.

As for the ROCM libraries - we'll need to see what is required to actually accelerate, the installed pytorch might be enough. Regarding the contents of oneclick.py - you are correct, my images do not use that script but it is still present because the upstream source code includes it. My suggestion is that we do not consider those aspects just yet - first we'll get the ROCM variant on your system, GPU access for the container, and then see if it works!

Alkali-V2 commented 7 months ago

RoCM support is 100% working with my RX 6800! You were absolutely correct, building was me trying to do things on hard mode. After installing the Unraid Docker Compose plugin I was able to edit the docker-compose.yml file as you mentioned and pull the container. It pulled without issue tagging RoCM correctly and showing up as 'default-rocm' when I run 'docker image ls'.

image

I had to make one critical change in my Unraid configuration where I specify the tag of the docker image I created as seen here image If I didn't specify that tag in docker it would download and install Nvidia-default.

With that done, it launched without issue and is outputting drastically better numbers using the AMD GPU using llama-2-7b. image

So it looks like your feature is tested and working! Now I might shift my focus to the unraid community app for this in some way. Thank you for taking the time to help me get this running and thank you for making this docker possible for us to use!

Atinoda commented 7 months ago

That's really great news - thank you for working to get it running and for sharing your results! Those are good speeds and it's cool that it's via Unraid - two questions answered in one deployment.

Last suggestion for day-to-day operations - if you are using the software server-style, you might want to consider adding a version tag to the image (e.g., default-rocm-snapshot-2024-02-18) so that it doesn't update unexpectedly on you! The plain tag will only pull stable versions - but the upstream project moves quickly, and if you're relying on a consistent API or interface, it might be better to do upgrades mindfully.