Gemma models quantized using llamacpp not working in lm studio

rombodawg commented 8 months ago

Gemma models that have been quantized using Llamacpp are not working. Please look into the issue

error

"llama.cpp error: 'create_tensor: tensor 'output.weight' not found'"

I will open a issue on the lm studio github aswell addressing this

https://github.com/lmstudio-ai/configs/issues/21

System: Ryzen 5600x rtx 3080 gpu b550 motherboard 64gb ddr4 ram windows 10 OS

Yefori-Go commented 8 months ago

Perhaps you should try using the latest llama.cpp to convert gemma model.

python .\convert-hf-to-gguf.py models\gemma-2b-it\ --outfile gemma-2b-it-f16.gguf

JohannesGaessler commented 8 months ago

I can only speak for myself but I 100% refuse to debug a problem unless it can be reproduced entirely with open-source code.

rombodawg commented 8 months ago

No it literally doesnt work

I just built this version of llamacpp. And that .py script doesnt work for Gemma

Plus that not the script you are even suppose to use according to the documentation. You are suppose to use convert.py

NotImplementedError: Architecture "GemmaForCausalLM" not supported!

E:\Open_source_ai_chatbot\Llamacpp-3\llama.cpp>python E:\Open_source_ai_chatbot\Llamacpp-mixtral\llamacpp-clone-mixtral\convert-hf-to-gguf.py E:\Open_source_ai_chatbot\OOBA_10\text-generation-webui-main\models\Gemma-EveryoneLLM-7b-test --outfile Gemma-EveryoneLLM-7b-test.gguf --outtype f16
Loading model: Gemma-EveryoneLLM-7b-test
Traceback (most recent call last):
  File "E:\Open_source_ai_chatbot\Llamacpp-mixtral\llamacpp-clone-mixtral\convert-hf-to-gguf.py", line 1033, in <module>    model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Open_source_ai_chatbot\Llamacpp-mixtral\llamacpp-clone-mixtral\convert-hf-to-gguf.py", line 48, in __init__
    self.model_arch = self._get_model_architecture()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Open_source_ai_chatbot\Llamacpp-mixtral\llamacpp-clone-mixtral\convert-hf-to-gguf.py", line 225, in _get_model_architecture
    raise NotImplementedError(f'Architecture "{arch}" not supported!')
NotImplementedError: Architecture "GemmaForCausalLM" not supported!

rombodawg commented 8 months ago

@JohannesGaessler I totally understand that, hopefully openai is willing to reach out and work with you to fix this

rombodawg commented 8 months ago

Im uploading the model files for the merges if anyone wants to do some debugging. Should be in the next 10 hours or so. Sorry slow internet.

Follow the mulit-thread. And check out my model for debugging.

Thread links: https://github.com/lmstudio-ai/configs/issues/21 https://github.com/ggerganov/llama.cpp/issues/5706 https://github.com/arcee-ai/mergekit/issues/181 https://github.com/oobabooga/text-generation-webui/issues/5562

https://huggingface.co/rombodawg/Gemme-Merge-Test-7b

hiepxanh commented 8 months ago

@rombodawg did you try latest version? https://github.com/ggerganov/llama.cpp/issues/6051

it already support

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

DementedWeasel1971 commented 6 months ago

I think like me people are considering other options. I will however keep on watching the release notes to see when this is fixed.

ggerganov / llama.cpp

Gemma models quantized using llamacpp not working in lm studio #5706