Closed logancyang closed 1 year ago
Tested the branch above and it's working now!
{"error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}
I did trip a little because of the actual model name is openllama-3b and not the same as the yaml file name, but it's responding now:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "openllama-3b",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
Response:
{
"object": "chat.completion",
"model": "openllama-3b",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "I am fine.\n\nHow do I write this sentence in English?\n\nA: You can use the following:\n\nI am fine.\n\n"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Llama2 is still not working though
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/llama2-chat.yaml"
}'
This gave me instant {"error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}
But when I curl the model it's 500
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama2-13b-chat",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}
So, this presumably needs better documentation eventually, but in most cases I wouldn't expect people to want to target "github:go-skynet/model-gallery/llama2-chat.yaml"
directly, as that's the "base configuration" used for all of the llama2-chat variants - and if you look at the file at that url, you can see that there's not actually a model specified for it, as it expects to be merged with a configuration that specifies which quantization you're using, etc.
Please try grabbing a specific ggml variant via the HF gallery, perhaps something like:
{
"id": "huggingface@thebloke__llama-2-13b-chat-ggml__llama-2-13b-chat.ggmlv3.q4_k_s.bin",
"name": "llama-2-13b-q4ks"
}
The HF gallery should automatically use the correct base config, but an automated job builds that gallery and could theoretically have errors. You can always override things by passing multiple configuration blocks together - see https://localai.io/models/index.html for more details.
So, this presumably needs better documentation eventually, but in most cases I wouldn't expect people to want to target
"github:go-skynet/model-gallery/llama2-chat.yaml"
directly, as that's the "base configuration" used for all of the llama2-chat variants - and if you look at the file at that url, you can see that there's not actually a model specified for it, as it expects to be merged with a configuration that specifies which quantization you're using, etc.Please try grabbing a specific ggml variant via the HF gallery, perhaps something like:
{ "id": "huggingface@thebloke__llama-2-13b-chat-ggml__llama-2-13b-chat.ggmlv3.q4_k_s.bin", "name": "llama-2-13b-q4ks" }
The HF gallery should automatically use the correct base config, but an automated job builds that gallery and could theoretically have errors. You can always override things by passing multiple configuration blocks together - see https://localai.io/models/index.html for more details.
This works, thanks! Some more detailed documentation will be super helpful indeed, just some more examples will help. Now people are running llama 2 everywhere, it's a good time for localai to grow big.
I'm integrating localai to my https://github.com/logancyang/obsidian-copilot plugin now. Will probably be around and ask more questions!
Hi @dave-gray101, I saw your change https://github.com/go-skynet/model-gallery/pull/18/files, but don't understand how this works.
You mentioned for something like this below, the gallery should automatically merge with the base config github:go-skynet/model-gallery/llama2-chat.yaml
, so the prompt template you have here is automatically applied to this model as long as it's a llama 2 variant? How does that work?
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"id": "huggingface@thebloke__luna-ai-llama2-uncensored-ggml__luna-ai-llama2-uncensored.ggmlv3.q4_k_s.bin",
"name": "llama-2-uncensored-q4ks"
}'
This above gave me:
chat.tmpl
completion.tmpl
llama-2-uncensored-q4ks.yaml
luna-ai-llama2-uncensored.ggmlv3.q4_K_S.bin
where llama-2-uncensored-q4ks.yaml
is
context_size: 1024
name: llama-2-uncensored-q4ks
parameters:
model: luna-ai-llama2-uncensored.ggmlv3.q4_K_S.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: chat
completion: completion
and chat.tmpl
is
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{{.Input}}
### Response:
Where did this come from and is the new llama2-chat.yaml
template supposed to replace this?
Response:
Kudos to @dave-gray101, it works now! The issue is in chat.tmpl that is auto-generated by /apply
. Need to manually update it according to the template in the model card.
Adding that instruction to my Obsidian plugin's doc.
Closing for now.
I'm new to LocalAI and am trying to install models by following instructions here
Here's what I did
Not sure what I did wrong here, can someone pls help take a look?
Some background of what I'm trying to do: I'm trying to run llama2 with LocalAI but can't figure out the right way to install it.
I tried this:
llama-2-7b-chat.ggmlv3.q4_0.bin manually
cp <my path to llama-2-7b-chat.ggmlv3.q4_0.bin> models/llama-2-7b-chat
./local-ai --models-path ./models/ --debug
(which works forggml-gpt4all-j
)curl http://localhost:8080/v1/models/
showingllama2-7b-chat
correctly it seems. ThenIt gave me a bunch of random gibberish, not sure why??