Closed deepankarsharma closed 3 days ago
What does exist is /opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
. Have you tried sudo env HSA_OVERRIDE_GFX_VERSION=10.3.0 ~/llama.cpp/build/bin/main
?
Maybe this thread gets you going?
https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-7799697
Thanks @danielzgtg and @akostadinov for your input. @deepankarsharma, any luck with Daniel or Aleksandar's suggestion?
@nartmada - I will give this a try soon and post an update.
mine same issues with laptop: AMD Ryzen 7 8845H w/ Radeon 780M Graphics
Can any one help me?
I tried many time with this guide line:
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html
I created docker container
https://github.com/badpaybad/Ner_Llm_Gpt/blob/main/mistralvn/dockerfile.base.rocm https://github.com/badpaybad/Ner_Llm_Gpt/blob/main/mistralvn/dockerfile
run
docker run --user root --privileged -d --restart always -p 11111:8080 -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --group-add render --ipc=host --shm-size 8G --name rocm_vistral7b_8880 rocm-vistral7b
I got the error when run container
`
2024-04-03 10:04:34 torch.cuda.is_available: False
2024-04-03 10:04:35 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 0%| | 0/2 [00Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
2024-04-03 10:04:35 Traceback (most recent call last):
2024-04-03 10:04:35 File "/app/main.py", line 79, in
@deepankarsharma Has this issue been resolved on your end? Thanks!
@deepankarsharma Closing ticket for now. Please feel free to re-open this ticket if you still see the issue with the latest ROCm and we will further investigate. Thanks!
Problem Description
I compiled llama.cpp with support for rocblas on Ubuntu 22.04.
When I try to run llama.cpp I get the following error
main: build = 1641 (6744dbe) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: seed = 1702592822
rocBLAS error: Cannot read /opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1103 List of available TensileLibrary Files : "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" "/opt/rocm-5.7.0/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat"
Operating System
22.04.3 LTS (Jammy Jellyfish)
CPU
AMD Ryzen 9 PRO 7940HS
GPU
780M
ROCm Version
5.7.0
ROCm Component
rocBLAS
Steps to Reproduce
Expect rocblas to work with gpu arch gfx1103
Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded =====================
HSA System Attributes
=====================
Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES
==========
HSA Agents
==========
Agent 1
Name: AMD Ryzen 9 PRO 7940HS w/ Radeon 780M Graphics Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 PRO 7940HS w/ Radeon 780M Graphics Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4000
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65025468(0x3e035bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65025468(0x3e035bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65025468(0x3e035bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 9(0x9)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2799
BDFID: 49920
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 33
SDMA engine uCode:: 16
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done