Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite.
llama: honor add_space_prefix from the model configuration
propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer.
llama: add support for small granite models
it works only for the small models 3b and 8b.
The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration.
Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite.
propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer.
it works only for the small models 3b and 8b.
The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration.