go-skynet / model-gallery

:card_file_box: a curated collection of models ready-to-use with LocalAI
https://localai.io/models/
Apache License 2.0
259 stars 67 forks source link

[MBA m1] Cannot install openllama_3b nor llama2-chat #16

Closed logancyang closed 1 year ago

logancyang commented 1 year ago

I'm new to LocalAI and am trying to install models by following instructions here

Here's what I did

GALLERIES='[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]' ./local-ai --models-path ./models/ --debug

# Check if model is available.
curl http://localhost:8080/models/available | jq '.[] | select(.name | contains("llama2"))'

# It successfully listed a bunch

# Trial 1: install a model from the gallery
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "url": "https://github.com/go-skynet/model-gallery/blob/main/openllama_3b.yaml"
   }'

# Here it fails
# {"error":{},"processed":true,"message":"error: yaml: line 136: mapping values are not allowed in this context","progress":0,"file_size":"","downloaded_size":""}%

# Trial 2
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/llama2-chat.yaml"
   }'
# Fails with
# {"error":{},"processed":true,"message":"error: yaml: line 6: could not find expected ':'","progress":0,"file_size":"","downloaded_size":""}

Not sure what I did wrong here, can someone pls help take a look?


Some background of what I'm trying to do: I'm trying to run llama2 with LocalAI but can't figure out the right way to install it.

I tried this:

  1. Going to https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main, download
    llama-2-7b-chat.ggmlv3.q4_0.bin manually
  2. In LocalAI dir, cp <my path to llama-2-7b-chat.ggmlv3.q4_0.bin> models/llama-2-7b-chat
  3. Start LocalAI with ./local-ai --models-path ./models/ --debug (which works for ggml-gpt4all-j)
  4. curl http://localhost:8080/v1/models/ showing llama2-7b-chat correctly it seems. Then
    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llama-2-7b-chat",
     "messages": [{"role": "user", "content": "who are you?"}],
     "temperature": 0.9
    }'

It gave me a bunch of random gibberish, not sure why??

{
  "object": "chat.completion",
  "model": "llama-2-7b-chat",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": " I hope you're doing well. Unterscheidung between the two is not always clear-cut and can vary depending on the context and perspective. Here are some key differences:\n1. Meaning: The term \"culture\" refers to the shared beliefs, values, customs, behaviors, and artifacts of a particular group or society. On the other hand, the term \"subculture\" refers to a sub-group within a larger culture, often defined by a shared set of beliefs, values, customs, behaviors, and artifacts that are distinct from those of the larger culture.\n2. Scope: Culture is a broader term that encompasses a wide range of social phenomena, including language, religion, customs, traditions, arts, literature, music, dance, food, dress, and many other aspects of human society. On the other hand, subculture is a narrower term that refers to a specific sub-group within a larger culture, often defined by a shared set of beliefs, values, customs, behaviors, and artifacts that are distinct from those of the larger culture.\n3. Level: Culture is a broader term that encompasses a wide range of social phenomena at various levels of society, including micro-level (e.g., individual interactions), meso-level (e.g.,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}
logancyang commented 1 year ago

Tested the branch above and it's working now!

{"error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}

I did trip a little because of the actual model name is openllama-3b and not the same as the yaml file name, but it's responding now:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "openllama-3b",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'

Response:

{
  "object": "chat.completion",
  "model": "openllama-3b",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "I am fine.\n\nHow do I write this sentence in English?\n\nA: You can use the following:\n\nI am fine.\n\n"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Llama2 is still not working though

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/llama2-chat.yaml"
   }'

This gave me instant {"error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}

But when I curl the model it's 500

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llama2-13b-chat",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}
dave-gray101 commented 1 year ago

So, this presumably needs better documentation eventually, but in most cases I wouldn't expect people to want to target "github:go-skynet/model-gallery/llama2-chat.yaml" directly, as that's the "base configuration" used for all of the llama2-chat variants - and if you look at the file at that url, you can see that there's not actually a model specified for it, as it expects to be merged with a configuration that specifies which quantization you're using, etc.

Please try grabbing a specific ggml variant via the HF gallery, perhaps something like:

{
    "id": "huggingface@thebloke__llama-2-13b-chat-ggml__llama-2-13b-chat.ggmlv3.q4_k_s.bin",
    "name": "llama-2-13b-q4ks"
}

The HF gallery should automatically use the correct base config, but an automated job builds that gallery and could theoretically have errors. You can always override things by passing multiple configuration blocks together - see https://localai.io/models/index.html for more details.

logancyang commented 1 year ago

So, this presumably needs better documentation eventually, but in most cases I wouldn't expect people to want to target "github:go-skynet/model-gallery/llama2-chat.yaml" directly, as that's the "base configuration" used for all of the llama2-chat variants - and if you look at the file at that url, you can see that there's not actually a model specified for it, as it expects to be merged with a configuration that specifies which quantization you're using, etc.

Please try grabbing a specific ggml variant via the HF gallery, perhaps something like:

{
    "id": "huggingface@thebloke__llama-2-13b-chat-ggml__llama-2-13b-chat.ggmlv3.q4_k_s.bin",
    "name": "llama-2-13b-q4ks"
}

The HF gallery should automatically use the correct base config, but an automated job builds that gallery and could theoretically have errors. You can always override things by passing multiple configuration blocks together - see https://localai.io/models/index.html for more details.

This works, thanks! Some more detailed documentation will be super helpful indeed, just some more examples will help. Now people are running llama 2 everywhere, it's a good time for localai to grow big.

I'm integrating localai to my https://github.com/logancyang/obsidian-copilot plugin now. Will probably be around and ask more questions!

logancyang commented 1 year ago

Hi @dave-gray101, I saw your change https://github.com/go-skynet/model-gallery/pull/18/files, but don't understand how this works.

You mentioned for something like this below, the gallery should automatically merge with the base config github:go-skynet/model-gallery/llama2-chat.yaml, so the prompt template you have here is automatically applied to this model as long as it's a llama 2 variant? How does that work?

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
    "id": "huggingface@thebloke__luna-ai-llama2-uncensored-ggml__luna-ai-llama2-uncensored.ggmlv3.q4_k_s.bin",
    "name": "llama-2-uncensored-q4ks"
    }'

This above gave me:

chat.tmpl
completion.tmpl
llama-2-uncensored-q4ks.yaml
luna-ai-llama2-uncensored.ggmlv3.q4_K_S.bin

where llama-2-uncensored-q4ks.yaml is

context_size: 1024
name: llama-2-uncensored-q4ks
parameters:
  model: luna-ai-llama2-uncensored.ggmlv3.q4_K_S.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat: chat
  completion: completion

and chat.tmpl is

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{{.Input}}

### Response:

Where did this come from and is the new llama2-chat.yaml template supposed to replace this?

logancyang commented 1 year ago

full_log.txt

Response:

Screenshot 2023-08-01 at 2 24 45 PM
logancyang commented 1 year ago

Kudos to @dave-gray101, it works now! The issue is in chat.tmpl that is auto-generated by /apply. Need to manually update it according to the template in the model card.

Adding that instruction to my Obsidian plugin's doc.

Closing for now.