Closed Lue-C closed 5 months ago
You can only convert to GGUF format from models with data in float16, bfloat16 or float32 formats. You can't convert models that are already quantized to a non-GGML format.
What you can do if you're willing to accept the quality loss of requantizing is convert the quantized tensors in your model to one of the formats I mentioned and then convert it to GGUF. Just keep in mind you'll be quantizing, unquantizing, then quantizing again and quantizing is a lossy process.
You can only convert to GGUF format from models with data in float16, bfloat16 or float32 formats. You can't convert models that are already quantized to a non-GGML format.
I have used the same code above to load and fine tune the model, this is my bits and bytes config
bnb_config = BitsAndBytesConfig( load_in_8bit=True, bnb_8bit_compute_dtype="float16" )
for loading the model.
at which part do i change the model so that the model is compatible with gguf formatting from the beginning without requantizing?
I think you'd have to do your finetuning at 16bit or above, which likely isn't an option since it would at least double memory requirements. So basically you probably have to convert the tensor back up to f16, not sure there's anything else you can do. I am not that familiar with finetuning stuff though.
how do i convert the tensor back up to fp16 or a compatible format?
And also I am not sure where this compatibility issue is occurring (I don't understand the internals of models), this is part of my config.json of the finetuned merged model:
"quantization_config": { "bnb_4bit_compute_dtype": "float32", "bnb_4bit_quant_type": "fp4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": false, "load_in_8bit": true, "quant_method": "bitsandbytes" }, "torch_dtype": "float16",
-so the dtype here is set as float16, and quantization dtype is float32, both of which seem to be a compatible type for conversion.
You can only convert to GGUF format from models with data in float16, bfloat16 or float32 formats. You can't convert models that are already quantized to a non-GGML format.
What you can do if you're willing to accept the quality loss of requantizing is convert the quantized tensors in your model to one of the formats I mentioned and then convert it to GGUF. Just keep in mind you'll be quantizing, unquantizing, then quantizing again and quantizing is a lossy process.
Thanks for the reply, I see the problem. I did not give it back converting a try because of the assumed quality loss. Here is what I did instead: I just used the finetuning from llama.cpp with a GGUF as a base model. Afterwards I used the "export-lora" to merge the adapter with the base model as a GGUF. This can be used like any other GGUF in langchain, which was the goal to me. I did the finetuning with the example text (shakespeare) but unfortunately do not know in which format I have to give training data for a question answering/causal LM. Does anyone have an idea?
I'm not sure exactly, if Torch suppports that 8 bit quantization then you could possibly load the model, then use Torch operations to convert it to the correct format. I think it would be something like model["tensorname"] = model["tensorname"].to(dtype = torch.float16)
. Note this is just a hint in the direction that might help you, I don't know enough to give you the exact command. Anyway, if you can find and convert the tensors that are in the wrong format then you can torch.save()
to a different file and then possibly convert it to GGUF format.
Unfortunately, you basically need to know some Python/Torch stuff to pull it off so if you don't then your best bet is to latch on to someone who does. (Not me though!)
I fixed it by merging the lora with the full base model instead of the bits and bytes quantized one. I just reloaded the model, and then merged. it worked without any error.
I have the same issue except my error is "U8". Spent hours trying to figure this out and this thread saved me, thanks!
This issue was closed because it has been inactive for 14 days since being marked as stale.
I encountered the same "I8" error while converting my fine-tuned Mixtral Model to GGUF file. Fortunately, I found a thread that helped me resolve the issue. Thank you!
Prerequisites
Hi there, I am finetuning the model
https://huggingface.co/jphme/em_german_7b_v01
using own data (I just replaced the questions and answers by dots to keep it short and simple). The model is loaded in 8 bit and a peft adapter is added, which is then trained. After merging the weights of the trained adapter and the original model and saving as a full model, I want to convert this model usingconvert.py
.Expected Behavior
The model is converted to GGUF and saved as a file.
Actual Behavior
Get a key error
KeyError: 'I8'
Environment and context
I am running the following code in colab. The relevant versions are given by the pip commands.
Now I want to convert the merged model to GGUF using
and get
Considering that the error indicates a data type problem and that converting the original model to GGUF works fine, I think that the problem is due to the 8 bit quantization.
Did I forget some option when converting or loading the merged model? How can I convert the merged model to GGUF?