Closed userbox020 closed 5 months ago
Have you tried with --no-mmap
?
Have you tried with
--no-mmap
?
@slaren beautiful bro, now its taking about 5 minutes to load codellama 70b
20:08:25-911759 INFO Loading codellama-70b-python.Q4_K_M.gguf
20:08:26-000032 INFO llama.cpp weights detected: models/codellama-70b-python.Q4_K_M.gguf
llm_load_tensors: ggml ctx size = 1.38 MiB
llm_load_tensors: offloading 80 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 81/81 layers to GPU
llm_load_tensors: ROCm0 buffer size = 10944.75 MiB
llm_load_tensors: ROCm1 buffer size = 7655.06 MiB
llm_load_tensors: ROCm2 buffer size = 10533.06 MiB
llm_load_tensors: ROCm3 buffer size = 10229.84 MiB
llm_load_tensors: ROCm_Host buffer size = 140.70 MiB
....................................................................................................
20:13:22-266706 INFO LOADER: llama.cpp
20:13:22-267434 INFO TRUNCATION LENGTH: 4096
20:13:22-268065 INFO INSTRUCTION TEMPLATE: Alpaca
20:13:22-268644 INFO Loaded the model in 296.36 seconds.
Downloading bigger quant right now to test it out. By the way im noticing the --numa
does it help on performance too?
well going to close the the issue, but i would like to keep chatting with you guys, do you have a discord im doing some test and trying to enable vulkan and kompute for amd @slaren
Hello
I have been testing llamacpp with ubuntu 22.04 and rocm5.6 it took me about 3 months to setup multigpu one rx6900 two rx6800 and one rx 6700 all together running on pcie x1 gen1.
Llamacpp seems the only LLM loader that works with this setup, but i have notice that when the model its above 30gb size it get stuck loading it. Sometimes it takes between 1 to 2 hours to load it because but when loading it does inference really fast. But sometimes it just get stuck there, the longest time i have tested its 24 hours and it just stuck, the dots doesnt move.
its weird because its just happens with models above 30gb size, all other models loads fast and inference fast.
What can be doing this, any idea on how can i debug this to know whats going on?
Any idea, suggestion or help its very well welcome, thanks