cocktailpeanut / dalai

The simplest way to run LLaMA on your local machine
https://cocktailpeanut.github.io/dalai
13.09k stars 1.43k forks source link

LLaMa 30B not working #425

Open McGamerComunity opened 1 year ago

McGamerComunity commented 1 year ago

i tried to install the llama 30B model and its not working alpaca 30B else (but i dont want it) start with your questions Operating System: Windows CPU: AMD Ryzen 5 2600 Version of Dalai: Docker Which Error:

/root/dalai/llama/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

what are tomatoes

Response:

" exit root@be8db5ed8cbc:~/dalai/llama# /root/dalai/llama/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

what are tomatoes

Response:

"

main: seed = 1682509900

llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...

llama_model_load: n_vocab = 32000

llama_model_load: n_ctx = 512

llama_model_load: n_embd = 6656

llama_model_load: n_mult = 256

llama_model_load: n_head = 52

llama_model_load: n_layer = 60

llama_model_load: n_rot = 128

llama_model_load: f16 = 2

llama_model_load: n_ff = 17920

llama_model_load: n_parts = 4

llama_model_load: ggml ctx size = 20951.50 MB

Segmentation fault

root@be8db5ed8cbc:~/dalai/llama# exit

exit

Information to add:

i used this command to install the 30B model docker compose run dalai npx dalai llama install 30B

and i downloaded the llama files from this urls from a github post:

My download is really slow or keeps getting interrupted Try downloading the model(s) manually through the browser LLaMA 7B https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth

13B https://agi.gpt4.org/llama/LLaMA/13B/consolidated.00.pth https://agi.gpt4.org/llama/LLaMA/13B/consolidated.01.pth

30B (i used those 4) https://agi.gpt4.org/llama/LLaMA/30B/consolidated.00.pth https://agi.gpt4.org/llama/LLaMA/30B/consolidated.01.pth https://agi.gpt4.org/llama/LLaMA/30B/consolidated.02.pth https://agi.gpt4.org/llama/LLaMA/30B/consolidated.03.pth

65B https://agi.gpt4.org/llama/LLaMA/65B/consolidated.00.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.01.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.02.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.03.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.04.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.05.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.06.pth https://agi.gpt4.org/llama/LLaMA/65B/consolidated.07.pth

if you need any more information ill give them to you

diman82 commented 1 year ago

I get the very same error ('Segmentation fault') on 30B model

pratyushtiwary commented 1 year ago

I got the same error, fixed it by explicitly mentioning context size.

What was the issue? In my case the size of model + context size was greater than total available ram so it failed because of that.

When I lowered the context size below 1024 it seems to work just fine.

I was using alpaca 7b model on a 6gb ram server.

I have added the option to control context size via UI and created the PR for the same: #424