Add the huggingface token parameter, and modify the file path in llama.cpp repo

I want to use the mistralai/Mistral-7B-Instruct-v0.2 model, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:

401 Client Error

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66b1862a-1bc229376e7f3f4020a3c951;60195d59-03d1-4f26-b3ce-d3b04c2fe2b4)
Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/36d7e540e651b68dac59394d9c3381651df7fb01/.gitattributes

So I added the optional HF_TOKEN= parameter in the code. If the users want to download the public model, there is no token needed; If the users want to download the private model, they need to use the huggingface token; Impacted files: README.md, download_huggingface.py, run.sh

No convert.py and quantize files under llama.cpp

python: can't open file '/opt/app-root/src/converter/llama.cpp/convert.py': [Errno 2] No such file or directory
run.sh: line 23: llama.cpp/quantize: No such file or directory

If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize

Impacted file: run.sh

No image name was specified in the README.md

So I added "converter" in the "podman run" command.

Modified the huggingface token parameter to UI web. Impacted file: ui.py

Here is my testing after the modification:

$ podman run -it --rm -v models:/converter/converted_models -e HF_MODEL_URL=mistralai/Mistral-7B-Instruct-v0.2 -e HF_TOKEN=*** -e QUANTIZATION=Q4_K_M -e KEEP_ORIGINAL_MODEL="False" localhost/converter

README.md: 100%|███████████████████████████████████████████████████████████████████████████████████| 5.47k/5.47k [00:00<00:00, 21.9MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 8.79MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 357kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 596/596 [00:00<00:00, 3.67MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 621kB/s]
pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████████| 23.9k/23.9k [00:00<00:00, 72.1MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 6.70MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 861kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████| 2.10k/2.10k [00:00<00:00, 12.7MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:02<00:00, 630kB/s]
model-00001-of-00003.safetensors: 100%|████████████████████████████████████████████████████████████| 4.94G/4.94G [52:42<00:00, 1.56MB/s]
model-00003-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 4.54G/4.54G [1:01:03<00:00, 1.24MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 4.94G/4.94G [1:05:53<00:00, 1.25MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:06:22<00:00, 1.26MB/s]
model-00002-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:07:19<00:00, 1.24MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.06G/5.06G [1:07:36<00:00, 1.25MB/s]
Fetching 16 files: 100%|█████████
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin:  99%|██████████████████████████████████████████████████████████▍| 4.95G/5.00G [5:50:49<03:48, 222kB/s]
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin: 100%|███████████████████████████████████████████████████████████| 5.00G/5.00G [5:54:12<00:00, 229kB/s]
....
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
INFO:convert:params = Params(n_vocab=32000, n_embd=4096, n_layer=32, n_ctx=32768, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=1000000.0, f_rope_scale=None, n_ctx_orig=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2'))
INFO:convert:Loaded vocab file PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/tokenizer.model'), type 'spm'
INFO:convert:model parameters count : (7241732096, 7241732096, 0) (7.2B)
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0}, add special tokens {'bos': True, 'eos': False}>
INFO:convert:Writing /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf, format 0
......
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf: n_tensors = 291, total_size = 29.0G
INFO:convert:[  1/291] Writing tensor token_embd.weight                      | size  32000 x   4096  | type F32  | T+   0
INFO:convert:[  2/291] Writing tensor blk.0.attn_norm.weight                 | size   4096           | type F32  | T+   1
INFO:convert:[  3/291] Writing tensor blk.0.ffn_down.weight                  | size   4096 x  14336  | type F32  | T+   1
INFO:convert:[  4/291] Writing tensor blk.0.ffn_gate.weight                  | size  14336 x   4096  | type F32  | T+   1
....

Here is the UI web testing with a public model(no token is needed) Screenshot at 2024-08-14 17-35-34

containers / ai-lab-recipes

Add the huggingface token parameter, and modify the file path in llama.cpp repo #761