Model exporters are producing incompatible models due to the lack of a vocabulary score

Hi there!

I'm one of the maintainers of https://github.com/rustformers/llama-rs, and we've recently been expanding our model support to cover GPT-NeoX as well as other models. As part of this, we've been testing with models on HuggingFace as appropriate.

We were testing with https://huggingface.co/byroneverson/ggml-stablelm-base-alpha-3b-q4_0 and noticed that the model was exported as a GGJT-format file where each vocabulary token was missing its f32 score - that is, (len, bytes) instead of (len, bytes, score).

I had a quick look at the repo here, and from what I can tell, the scripts and C++ skip over the production/reading of the vocabulary score.

Unfortunately, only the GGML format has scoreless tokens. Both GGMF and GGJT require tokens to have a score. As a result, models produced using this workflow are invalid GGMF/GGJT models for other non-gptneox.cpp loaders.

This may also cause problems in the future with ggml's GPT-NeoX model example, which uses scoreless GGML.

My suggested fixes are one of the following:

use the GGML format
update your script and C++ to write a 0.0 score and to skip over it during the read

byroneverson / llm.cpp

Model exporters are producing incompatible models due to the lack of a vocabulary score #5