ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.14k stars 9.5k forks source link

How to prepare&run the model #4808

Closed DerrickYLJ closed 6 months ago

DerrickYLJ commented 9 months ago

On Mac M1 Chip, I have encountered the following two problems.

(1) When I am following the instruction for "prepare&run model" step, I noticed that I need to run the following command:

# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

However, when I copied and pasted the command into my terminal, it says zsh: command not found: 65B. I am wondering what's the correct way to run the model?

(2) Moreover, when I am trying to follow the instruction command ./main -m models/llama-13b-v2/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e and run it, it turns out to be an error error loading model: failed to open models/llama-13b-v2/ggml-model-q4_0.gguf: No such file or directory. I assume this issue is associated to the first one so I am wondering how to resolve this.

Thanks!

morpheus2448 commented 9 months ago

This link Will take you to TheBloke's GGML format models listed by most downloads.

  1. Choose a model (a 7B parameter model will work even with 8GB RAM) like Llama-2-7B-Chat-GGML.
  2. Click the Files and versions tab.
  3. Use the download link to the right of a file to download the model file - I recommend the q5_0 version.
  4. When the file is downloaded, move it to the models folder.
  5. Update your run command with the correct model filename.
pudepiedj commented 9 months ago

On Mac M1 Chip, I have encountered the following two problems.

(1) When I am following the instruction for "prepare&run model" step, I noticed that I need to run the following command:

# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

However, when I copied and pasted the command into my terminal, it says zsh: command not found: 65B. I am wondering what's the correct way to run the model?

(2) Moreover, when I am trying to follow the instruction command ./main -m models/llama-13b-v2/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e and run it, it turns out to be an error error loading model: failed to open models/llama-13b-v2/ggml-model-q4_0.gguf: No such file or directory. I assume this issue is associated to the first one so I am wondering how to resolve this.

Thanks!

Hi, I think the second line in your first question is supposed to be the output from the first line, so you shouldn't put that into the zsh terminal. 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model is what you get when you list the contents of ./models. The first four entries will be directories containing the 65B, 30B, 13B and 7B models. Suggest you run ls -R ./models to see the contents of all the subdirectories. Unless you have a mega machine with 192GB of VRAM you almost certainly won't be able to run either the 65B or the 30B models, so you can delete them and save disk space.

Your second problem will then be solved because you will be able to see the correct path to whichever model you want to run. For example, once you know where your downloaded models are located, for example in ./models/13B, you can alter your command-line to something like:

./main -m ./models/13B/ggml-model-q4_0.gguf

and it should find it provided your directory structure holds that model inside the ./models/13B directory and you have used cmake/make to put the main executable into the cwd.

DerrickYLJ commented 9 months ago

Hello @pudepiedj and @morpheus2448, thanks for your reply!

I have been trying type the command ls ./models but it turns out to be as follows: ggml-vocab-aquila.gguf ggml-vocab-gpt-neox.gguf ggml-vocab-mpt.gguf ggml-vocab-starcoder.gguf ggml-vocab-baichuan.gguf ggml-vocab-gpt2.gguf ggml-vocab-refact.gguf ggml-vocab-falcon.gguf ggml-vocab-llama.gguf ggml-vocab-stablelm-3b-4e1t.gguf but not 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model. Does this mean I need to manually download model weights I want to use?

pudepiedj commented 9 months ago

Hello @pudepiedj and @morpheus2448, thanks for your reply!

I have been trying type the command ls ./models but it turns out to be as follows: ggml-vocab-aquila.gguf ggml-vocab-gpt-neox.gguf ggml-vocab-mpt.gguf ggml-vocab-starcoder.gguf ggml-vocab-baichuan.gguf ggml-vocab-gpt2.gguf ggml-vocab-refact.gguf ggml-vocab-falcon.gguf ggml-vocab-llama.gguf ggml-vocab-stablelm-3b-4e1t.gguf but not 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model. Does this mean I need to manually download model weights I want to use?

It looks as if you have cloned the repository but not downloaded model weights. So it depends what you want. To get any original llama-2 models you will need to register with Meta (it's free) and download one of their models because they are not available elsewhere, I think. If you want to run things from HuggingFace, for example @TheBloke's gguf conversions, you can download them from one of his folders. Those weights need to go into /models or a subdirectory depending on how you want to organise things. Whatever you do you will need to cmake/make the llama.cpp code that comes with the repo by following the installation instructions because that's how you will get executables like main from which to run other models. It might be worth deleting whatever you've already got, run git clone again either from the @ggerganov repo or from a fork of your own, and start with a clean installation.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.