ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.07k stars 9.76k forks source link

Bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto') #10077

Open awesomecoolraj opened 3 weeks ago

awesomecoolraj commented 3 weeks ago

What happened?

I expected to be able to convert into normal usage formats using convert_hf_to_gguf.py. Why only these formats are working?

Name and Version

I dont know the exact version but it is the latest git clone version.

What operating system are you seeing the problem on?

Linux

Relevant log output

INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:models/HappyAIUser_NotQwen/gguf/model_q8_0.gguf: n_tensors = 255, total_size = 3.4G
Writing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.41G/3.41G [01:09<00:00, 49.4Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to models/HappyAIUser_NotQwen/gguf/model_q8_0.gguf
Pushing GGUF model (q8_0) to HuggingFace...
model_q8_0.gguf: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.42G/3.42G [04:30<00:00, 12.6MB/s]

Converting to GGUF (q4_k_m): models/HappyAIUser_NotQwen/gguf/model_q4_k_m.gguf
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}] [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose]
                             [--split-max-tensors SPLIT_MAX_TENSORS] [--split-max-size SPLIT_MAX_SIZE] [--dry-run] [--no-tensor-first-split] [--metadata METADATA]
                             model
convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
Error during auto-save: Command '['python', 'llama.cpp/convert_hf_to_gguf.py', 'models/temp_hf_for_gguf', '--outfile', 'models/HappyAIUser_NotQwen/gguf/model_q4_k_m.gguf', '--outtype', 'q4_k_m']' returned non-zero exit status 2.
Traceback:
Traceback (most recent call last):
  File "/mnt/c/Users/Admin/Documents/Gradio Unsloth Colab Notebooks/llama_gradio_ui/app.py", line 718, in auto_save_models
    subprocess.run(convert_cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'llama.cpp/convert_hf_to_gguf.py', 'models/temp_hf_for_gguf', '--outfile', 'models/HappyAIUser_NotQwen/gguf/model_q4_k_m.gguf', '--outtype', 'q4_k_m']' returned non-zero exit status 2.

Error during auto-save: Command '['python', 'llama.cpp/convert_hf_to_gguf.py', 'models/temp_hf_for_gguf', '--outfile', 'models/HappyAIUser_NotQwen/gguf/model_q4_k_m.gguf', '--outtype', 'q4_k_m']' returned non-zero exit status 2.
Traceback:
Traceback (most recent call last):
  File "/mnt/c/Users/Admin/Documents/Gradio Unsloth Colab Notebooks/llama_gradio_ui/app.py", line 718, in auto_save_models
    subprocess.run(convert_cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'llama.cpp/convert_hf_to_gguf.py', 'models/temp_hf_for_gguf', '--outfile', 'models/HappyAIUser_NotQwen/gguf/model_q4_k_m.gguf', '--outtype', 'q4_k_m']' returned non-zero exit status 2.
arch-btw commented 3 weeks ago

Please use llama-quantize for that. It's in the llama.cpp directory.