containers / ai-lab-recipes

Examples for building and running LLM services and applications locally with Podman
Apache License 2.0
103 stars 106 forks source link

Add the huggingface token parameter, and modify the file path in llama.cpp repo #761

Closed melodyliu1986 closed 1 month ago

melodyliu1986 commented 1 month ago

I want to use the mistralai/Mistral-7B-Instruct-v0.2 model, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:

  1. 401 Client Error
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66b1862a-1bc229376e7f3f4020a3c951;60195d59-03d1-4f26-b3ce-d3b04c2fe2b4)
Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/36d7e540e651b68dac59394d9c3381651df7fb01/.gitattributes

So I added the optional HF_TOKEN= parameter in the code. If the users want to download the public model, there is no token needed; If the users want to download the private model, they need to use the huggingface token; Impacted files: README.md, download_huggingface.py, run.sh

  1. No convert.py and quantize files under llama.cpp
python: can't open file '/opt/app-root/src/converter/llama.cpp/convert.py': [Errno 2] No such file or directory
run.sh: line 23: llama.cpp/quantize: No such file or directory

If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize

Impacted file: run.sh

  1. No image name was specified in the README.md

So I added "converter" in the "podman run" command.

  1. Modified the huggingface token parameter to UI web. Impacted file: ui.py

Here is my testing after the modification:

$ podman run -it --rm -v models:/converter/converted_models -e HF_MODEL_URL=mistralai/Mistral-7B-Instruct-v0.2 -e HF_TOKEN=*** -e QUANTIZATION=Q4_K_M -e KEEP_ORIGINAL_MODEL="False" localhost/converter

README.md: 100%|███████████████████████████████████████████████████████████████████████████████████| 5.47k/5.47k [00:00<00:00, 21.9MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 8.79MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 357kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 596/596 [00:00<00:00, 3.67MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 621kB/s]
pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████████| 23.9k/23.9k [00:00<00:00, 72.1MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 6.70MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 861kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████| 2.10k/2.10k [00:00<00:00, 12.7MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:02<00:00, 630kB/s]
model-00001-of-00003.safetensors: 100%|████████████████████████████████████████████████████████████| 4.94G/4.94G [52:42<00:00, 1.56MB/s]
model-00003-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 4.54G/4.54G [1:01:03<00:00, 1.24MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 4.94G/4.94G [1:05:53<00:00, 1.25MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:06:22<00:00, 1.26MB/s]
model-00002-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:07:19<00:00, 1.24MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.06G/5.06G [1:07:36<00:00, 1.25MB/s]
Fetching 16 files: 100%|█████████
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin:  99%|██████████████████████████████████████████████████████████▍| 4.95G/5.00G [5:50:49<03:48, 222kB/s]
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin: 100%|███████████████████████████████████████████████████████████| 5.00G/5.00G [5:54:12<00:00, 229kB/s]
....
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
INFO:convert:params = Params(n_vocab=32000, n_embd=4096, n_layer=32, n_ctx=32768, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=1000000.0, f_rope_scale=None, n_ctx_orig=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2'))
INFO:convert:Loaded vocab file PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/tokenizer.model'), type 'spm'
INFO:convert:model parameters count : (7241732096, 7241732096, 0) (7.2B)
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0}, add special tokens {'bos': True, 'eos': False}>
INFO:convert:Writing /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf, format 0
......
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf: n_tensors = 291, total_size = 29.0G
INFO:convert:[  1/291] Writing tensor token_embd.weight                      | size  32000 x   4096  | type F32  | T+   0
INFO:convert:[  2/291] Writing tensor blk.0.attn_norm.weight                 | size   4096           | type F32  | T+   1
INFO:convert:[  3/291] Writing tensor blk.0.ffn_down.weight                  | size   4096 x  14336  | type F32  | T+   1
INFO:convert:[  4/291] Writing tensor blk.0.ffn_gate.weight                  | size  14336 x   4096  | type F32  | T+   1
....

Here is the UI web testing with a public model(no token is needed) Screenshot at 2024-08-14 17-35-34

melodyliu1986 commented 1 month ago

@rhatdan @MichaelClifford

I made some mistakes in the previous PR #741, so I closed it. Is there anyway to delete 741?

I made changes according to your comments, please review it again.

rhatdan commented 1 month ago

LGTM

MichaelClifford commented 1 month ago

Great! Thanks for making those changes @melodyliu1986 Will re-review now.