Closed cyberpython closed 12 months ago
Duplicate of bentoml/BentoML#3985
qq: Can you check if on Linux the following path exists /opt/rocm/libexec/rocm_smi
?
Yeap (looks like I have 2 ROCM installations for some reason, one under /opt/rocm
and another under /opt/rocm-5.5.0
with the latter one being used):
~ $ ls -al /opt/rocm/libexec/rocm_smi
total 176
drwxr-xr-x 2 root root 4096 Ιουν 23 00:50 .
drwxr-xr-x 6 root root 4096 Ιουν 24 14:28 ..
-rwxrwxr-x 1 root root 148963 Απρ 19 03:03 rocm_smi.py
-rw-r--r-- 1 root root 19229 Απρ 19 03:03 rsmiBindings.py
~ $ ls -al /opt/rocm-5.5.0/libexec/rocm_smi/
total 176
drwxr-xr-x 2 root root 4096 Ιουν 23 00:50 .
drwxr-xr-x 6 root root 4096 Ιουν 24 14:28 ..
-rwxrwxr-x 1 root root 148963 Απρ 19 03:03 rocm_smi.py
-rw-r--r-- 1 root root 19229 Απρ 19 03:03 rsmiBindings.py
Does the /opt/rocm-5.5.0
symlinked back to /opt/rocm
?
I have preliminary support for AMD GPU on main branch. You can try it out:
pip install git+https://github.com/bentoml/openllm@main
This has been included in 0.1.19
Awesome!! Thanks!
I installed 0.1.19 from PyPI and tried to run the StarCoder model but I still get No GPU available, therefore this command is disabled
. I was not expecting my GPU to be able to load StarCoder (not enough vRAM) but it should at least try to.
Installation & execution steps:
~$ python3 -m venv env
~$ source env/bin/activate
~$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
~$ pip3 install openllm==0.1.19
~$ pip install "openllm[starcoder]"
openllm start starcoder
PyTorch reports that GPU is available:
~$ python3 -c "import torch;print(torch.cuda.is_available())"
True
rocminfo
output:
~$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 5825U with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 5825U with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 12351708(0xbc78dc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 12351708(0xbc78dc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12351708(0xbc78dc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx90c
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5607(0x15e7)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 1792
Internal Node ID: 1
Compute Unit: 8
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304(0x400000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90c:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Can you set CUDA_VISIBLE_DEVICES=0?
Just tried it, no luck - is there any logs / output I could provide that could help?
In any case, it could be an issue with my setup, if it works on another AMD setup I suggest this issue gets closed.
I don't have my hand on a AMD GPU right now, so I'm just going off the assumption of rocm is setting up correctly.
https://github.com/bentoml/OpenLLM/blob/8ac2755de4468225a5f072c1e548f5cf9978c32d/src/openllm/_strategies.py#L201 This is where we handle the model.
tentatively support since I don't really have the access to hardware to test this out.
Feature request
It would be nice to have the option to use AMD GPUs that support ROCm .
PyTorch seems to support ROCm AMD GPUs on Linux - the following was tested on Ubuntu 22.04.2 LTS with an AMD Ryzen 5825U (Radeon Vega Barcelo 8-core, shared memory) and ROCm 5.5.0 and PyTorch for ROCm 5.4.2 installed:
A cursory look seems to indicate that currently there are only CPU and NVidia Resource implementations in BentoML:
https://github.com/bentoml/BentoML/blob/c34689050ce1a2be2ee6a8809629cd715ae50ea6/src/bentoml/_internal/resource.py#L217
Motivation
This feature would enable running models with OpenLLM on AMD GPUs with ROCm support.
Other
No response