Closed LeLaboDuGame closed 1 year ago
There's reports from the upstream llama.cpp that CLBlast is not well supported.
You need 16GB of RAM for a 13B model. Try using the 7B model. Try just running the 7B model w/OpenBLAS and see how that goes.
There's reports from the upstream llama.cpp that CLBlast is not well supported.
It should be well supported, there was a pretty good PR merged in around 6 days ago. Got any reference to this from after that PR?
On a side note, having similar problems getting clblast gpu acceleration working with llama-cpp-python. CLblast builds from llama.cpp releases works as expected.
It should be well supported, there was a pretty good PR merged in around 6 days ago. Got any reference to this from after that PR?
You're right. As of over a week ago there were issues, but it seems things are a lot stabler post that PR.
Hey! I going to see that so
So have I to try OpenBlas ? and can OpenBlas work with my amd gpu ?
OpenCL support for AMD GPUs seems to have been added to llama.cpp
. The latest llama-cpp-python
looks to include the version of llama.cpp
that adds OpenCL support.
OpenBLAS is CPU only, sorry.
Ok ok ! so I retry with CLBLAST and it work ! but I have some other problem not relative to the main topic...
I think I have to repost a nex topic but it's just some simple questions.
1-I see that the model store old convertional prompt because when I retsart completly the program he gives me old tokens. I want so to reset the model and I dont know how to do it...
2-I see that the model dont recognize old tokens and he have to reeval each time beetwen two chat_completion so it takes a big time to generate and I want to reduce that ! I see that on kobbold.cpp eval was made just one time and after the model knows whats are sayed.
3-How to generate token per token and get them in a for
to maybe print theme procedurally.
Thanks in advance !
Yes, please open a new ticket describing what you expected to happen and what actually happened.
Copy/paste the text output and use Github markdown to make it easy to read your examples.
Expected Behavior
Hello ! I'm a bit young so I dont speak verry good in english. I discover llama not long ago. I was immediately interested in llama cpp python because it is a simple way for me to integrate into llama projects. So basically I have this code:
It is wise to prompt me my message and display it to me.
The exit:
The response is bad but you understand
But unfortunately I have a lot of lag! First of all, my ram saturates quickly and I freeze, which makes the situation frustrating (I had to restart my pc, it's so buggy)
I use CLBLAST for my amazing AMD graphic card
The problem is here I am using CLBLAST but my gpu stays at 0-3% usage ... So I try to increment
n_çgpu_layers
parameter but still not working (I dont know what is it so ..)Current Behavior
AMD 6600XT i5 10400f 16 go 3200mhz Windows 11
Thanks in advance ! PS: I use a 13b q4_0 llama model