byroneverson / llm.cpp

Fork of llama.cpp, extended for GPT-NeoX, RWKV-v4, and Falcon models
MIT License
28 stars 2 forks source link

Model exporters are producing incompatible models due to the lack of a vocabulary score #5

Closed philpax closed 1 year ago

philpax commented 1 year ago

Hi there!

I'm one of the maintainers of https://github.com/rustformers/llama-rs, and we've recently been expanding our model support to cover GPT-NeoX as well as other models. As part of this, we've been testing with models on HuggingFace as appropriate.

We were testing with https://huggingface.co/byroneverson/ggml-stablelm-base-alpha-3b-q4_0 and noticed that the model was exported as a GGJT-format file where each vocabulary token was missing its f32 score - that is, (len, bytes) instead of (len, bytes, score).

I had a quick look at the repo here, and from what I can tell, the scripts and C++ skip over the production/reading of the vocabulary score.

Unfortunately, only the GGML format has scoreless tokens. Both GGMF and GGJT require tokens to have a score. As a result, models produced using this workflow are invalid GGMF/GGJT models for other non-gptneox.cpp loaders.

This may also cause problems in the future with ggml's GPT-NeoX model example, which uses scoreless GGML.

My suggested fixes are one of the following:

byroneverson commented 1 year ago

Thank you for pointing this out. The newest push now uses 0.0 placeholder scores again for ggjt format. I am currently re-uploading the quantized models to hugging face so they should all be available fairly soon.

I will also be uploading a q4_0 quantized version of TRL-Lib's Stack LLaMa if you would like to test it as well. Meta's original licensing/access applies to Stack LLaMa so be sure to apply for access if you have not already. (I have to imagine you already have at this point so this is more for anyone else that may come across this) https://ai.facebook.com/blog/large-language-model-llama-meta-ai/