Prerequisites

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed). -- I searched for xglm but couldn't find any mention at all in the whole project.
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Add support for facebook's XGLM models (e.g. xglm-564M), including converting them to gguf and then running them with llama.cpp. The HuggingFace implementation docs are here, while the models specs are here, for reference.

Motivation

XGLM models have good performance on specific tasks, despite their size. For example, the 564M model is small enough to run on any device, and can understand quite well the context of a sentence and extract specific parts (a useful use-case for an assistant). I could run xglm-564M using HuggingFace's framework from within termux, however it doesn't support (efficient) quantization on CPU, and so the model ends up using 3 GB of RAM and being quite slow. Also, I would like to embed an XGLM model in an Android app, and doing so with llama.cpp would be much simpler (and more efficient) than packaging python, transformers and all other dependencies of the HuggingFace implementation.

Possible Implementation

The convert-hf-to-gguf.py script should add support for XGLMForCausalLM and probably something more needs to be implemented in llama.cpp if some new layer type needs to be implemented. I would be open to help with implementation, however I don't know much about LLM architecture in general, and about the llama.cpp project specifically. I looked at one recently merged PR that added support for a new model structure, but couldn't really understand what was going on. Do you have some documentation on how to add new model types? Would it be simple to add XGLM or does it have a nonstandard architecture?

These are the layers in the XGLM-564M model:

``` model.embed_tokens.weight model.layers.0.self_attn.k_proj.weight model.layers.0.self_attn.k_proj.bias model.layers.0.self_attn.v_proj.weight model.layers.0.self_attn.v_proj.bias model.layers.0.self_attn.q_proj.weight model.layers.0.self_attn.q_proj.bias model.layers.0.self_attn.out_proj.weight model.layers.0.self_attn.out_proj.bias model.layers.0.self_attn_layer_norm.weight model.layers.0.self_attn_layer_norm.bias model.layers.0.fc1.weight model.layers.0.fc1.bias model.layers.0.fc2.weight model.layers.0.fc2.bias model.layers.0.final_layer_norm.weight model.layers.0.final_layer_norm.bias model.layers.1.self_attn.k_proj.weight model.layers.1.self_attn.k_proj.bias model.layers.1.self_attn.v_proj.weight model.layers.1.self_attn.v_proj.bias model.layers.1.self_attn.q_proj.weight model.layers.1.self_attn.q_proj.bias model.layers.1.self_attn.out_proj.weight model.layers.1.self_attn.out_proj.bias model.layers.1.self_attn_layer_norm.weight model.layers.1.self_attn_layer_norm.bias model.layers.1.fc1.weight model.layers.1.fc1.bias model.layers.1.fc2.weight model.layers.1.fc2.bias model.layers.1.final_layer_norm.weight model.layers.1.final_layer_norm.bias model.layers.2.self_attn.k_proj.weight model.layers.2.self_attn.k_proj.bias model.layers.2.self_attn.v_proj.weight model.layers.2.self_attn.v_proj.bias model.layers.2.self_attn.q_proj.weight model.layers.2.self_attn.q_proj.bias model.layers.2.self_attn.out_proj.weight model.layers.2.self_attn.out_proj.bias model.layers.2.self_attn_layer_norm.weight model.layers.2.self_attn_layer_norm.bias model.layers.2.fc1.weight model.layers.2.fc1.bias model.layers.2.fc2.weight model.layers.2.fc2.bias model.layers.2.final_layer_norm.weight model.layers.2.final_layer_norm.bias model.layers.3.self_attn.k_proj.weight model.layers.3.self_attn.k_proj.bias model.layers.3.self_attn.v_proj.weight model.layers.3.self_attn.v_proj.bias model.layers.3.self_attn.q_proj.weight model.layers.3.self_attn.q_proj.bias model.layers.3.self_attn.out_proj.weight model.layers.3.self_attn.out_proj.bias model.layers.3.self_attn_layer_norm.weight model.layers.3.self_attn_layer_norm.bias model.layers.3.fc1.weight model.layers.3.fc1.bias model.layers.3.fc2.weight model.layers.3.fc2.bias model.layers.3.final_layer_norm.weight model.layers.3.final_layer_norm.bias model.layers.4.self_attn.k_proj.weight model.layers.4.self_attn.k_proj.bias model.layers.4.self_attn.v_proj.weight model.layers.4.self_attn.v_proj.bias model.layers.4.self_attn.q_proj.weight model.layers.4.self_attn.q_proj.bias model.layers.4.self_attn.out_proj.weight model.layers.4.self_attn.out_proj.bias model.layers.4.self_attn_layer_norm.weight model.layers.4.self_attn_layer_norm.bias model.layers.4.fc1.weight model.layers.4.fc1.bias model.layers.4.fc2.weight model.layers.4.fc2.bias model.layers.4.final_layer_norm.weight model.layers.4.final_layer_norm.bias model.layers.5.self_attn.k_proj.weight model.layers.5.self_attn.k_proj.bias model.layers.5.self_attn.v_proj.weight model.layers.5.self_attn.v_proj.bias model.layers.5.self_attn.q_proj.weight model.layers.5.self_attn.q_proj.bias model.layers.5.self_attn.out_proj.weight model.layers.5.self_attn.out_proj.bias model.layers.5.self_attn_layer_norm.weight model.layers.5.self_attn_layer_norm.bias model.layers.5.fc1.weight model.layers.5.fc1.bias model.layers.5.fc2.weight model.layers.5.fc2.bias model.layers.5.final_layer_norm.weight model.layers.5.final_layer_norm.bias model.layers.6.self_attn.k_proj.weight model.layers.6.self_attn.k_proj.bias model.layers.6.self_attn.v_proj.weight model.layers.6.self_attn.v_proj.bias model.layers.6.self_attn.q_proj.weight model.layers.6.self_attn.q_proj.bias model.layers.6.self_attn.out_proj.weight model.layers.6.self_attn.out_proj.bias model.layers.6.self_attn_layer_norm.weight model.layers.6.self_attn_layer_norm.bias model.layers.6.fc1.weight model.layers.6.fc1.bias model.layers.6.fc2.weight model.layers.6.fc2.bias model.layers.6.final_layer_norm.weight model.layers.6.final_layer_norm.bias model.layers.7.self_attn.k_proj.weight model.layers.7.self_attn.k_proj.bias model.layers.7.self_attn.v_proj.weight model.layers.7.self_attn.v_proj.bias model.layers.7.self_attn.q_proj.weight model.layers.7.self_attn.q_proj.bias model.layers.7.self_attn.out_proj.weight model.layers.7.self_attn.out_proj.bias model.layers.7.self_attn_layer_norm.weight model.layers.7.self_attn_layer_norm.bias model.layers.7.fc1.weight model.layers.7.fc1.bias model.layers.7.fc2.weight model.layers.7.fc2.bias model.layers.7.final_layer_norm.weight model.layers.7.final_layer_norm.bias model.layers.8.self_attn.k_proj.weight model.layers.8.self_attn.k_proj.bias model.layers.8.self_attn.v_proj.weight model.layers.8.self_attn.v_proj.bias model.layers.8.self_attn.q_proj.weight model.layers.8.self_attn.q_proj.bias model.layers.8.self_attn.out_proj.weight model.layers.8.self_attn.out_proj.bias model.layers.8.self_attn_layer_norm.weight model.layers.8.self_attn_layer_norm.bias model.layers.8.fc1.weight model.layers.8.fc1.bias model.layers.8.fc2.weight model.layers.8.fc2.bias model.layers.8.final_layer_norm.weight model.layers.8.final_layer_norm.bias model.layers.9.self_attn.k_proj.weight model.layers.9.self_attn.k_proj.bias model.layers.9.self_attn.v_proj.weight model.layers.9.self_attn.v_proj.bias model.layers.9.self_attn.q_proj.weight model.layers.9.self_attn.q_proj.bias model.layers.9.self_attn.out_proj.weight model.layers.9.self_attn.out_proj.bias model.layers.9.self_attn_layer_norm.weight model.layers.9.self_attn_layer_norm.bias model.layers.9.fc1.weight model.layers.9.fc1.bias model.layers.9.fc2.weight model.layers.9.fc2.bias model.layers.9.final_layer_norm.weight model.layers.9.final_layer_norm.bias model.layers.10.self_attn.k_proj.weight model.layers.10.self_attn.k_proj.bias model.layers.10.self_attn.v_proj.weight model.layers.10.self_attn.v_proj.bias model.layers.10.self_attn.q_proj.weight model.layers.10.self_attn.q_proj.bias model.layers.10.self_attn.out_proj.weight model.layers.10.self_attn.out_proj.bias model.layers.10.self_attn_layer_norm.weight model.layers.10.self_attn_layer_norm.bias model.layers.10.fc1.weight model.layers.10.fc1.bias model.layers.10.fc2.weight model.layers.10.fc2.bias model.layers.10.final_layer_norm.weight model.layers.10.final_layer_norm.bias model.layers.11.self_attn.k_proj.weight model.layers.11.self_attn.k_proj.bias model.layers.11.self_attn.v_proj.weight model.layers.11.self_attn.v_proj.bias model.layers.11.self_attn.q_proj.weight model.layers.11.self_attn.q_proj.bias model.layers.11.self_attn.out_proj.weight model.layers.11.self_attn.out_proj.bias model.layers.11.self_attn_layer_norm.weight model.layers.11.self_attn_layer_norm.bias model.layers.11.fc1.weight model.layers.11.fc1.bias model.layers.11.fc2.weight model.layers.11.fc2.bias model.layers.11.final_layer_norm.weight model.layers.11.final_layer_norm.bias model.layers.12.self_attn.k_proj.weight model.layers.12.self_attn.k_proj.bias model.layers.12.self_attn.v_proj.weight model.layers.12.self_attn.v_proj.bias model.layers.12.self_attn.q_proj.weight model.layers.12.self_attn.q_proj.bias model.layers.12.self_attn.out_proj.weight model.layers.12.self_attn.out_proj.bias model.layers.12.self_attn_layer_norm.weight model.layers.12.self_attn_layer_norm.bias model.layers.12.fc1.weight model.layers.12.fc1.bias model.layers.12.fc2.weight model.layers.12.fc2.bias model.layers.12.final_layer_norm.weight model.layers.12.final_layer_norm.bias model.layers.13.self_attn.k_proj.weight model.layers.13.self_attn.k_proj.bias model.layers.13.self_attn.v_proj.weight model.layers.13.self_attn.v_proj.bias model.layers.13.self_attn.q_proj.weight model.layers.13.self_attn.q_proj.bias model.layers.13.self_attn.out_proj.weight model.layers.13.self_attn.out_proj.bias model.layers.13.self_attn_layer_norm.weight model.layers.13.self_attn_layer_norm.bias model.layers.13.fc1.weight model.layers.13.fc1.bias model.layers.13.fc2.weight model.layers.13.fc2.bias model.layers.13.final_layer_norm.weight model.layers.13.final_layer_norm.bias model.layers.14.self_attn.k_proj.weight model.layers.14.self_attn.k_proj.bias model.layers.14.self_attn.v_proj.weight model.layers.14.self_attn.v_proj.bias model.layers.14.self_attn.q_proj.weight model.layers.14.self_attn.q_proj.bias model.layers.14.self_attn.out_proj.weight model.layers.14.self_attn.out_proj.bias model.layers.14.self_attn_layer_norm.weight model.layers.14.self_attn_layer_norm.bias model.layers.14.fc1.weight model.layers.14.fc1.bias model.layers.14.fc2.weight model.layers.14.fc2.bias model.layers.14.final_layer_norm.weight model.layers.14.final_layer_norm.bias model.layers.15.self_attn.k_proj.weight model.layers.15.self_attn.k_proj.bias model.layers.15.self_attn.v_proj.weight model.layers.15.self_attn.v_proj.bias model.layers.15.self_attn.q_proj.weight model.layers.15.self_attn.q_proj.bias model.layers.15.self_attn.out_proj.weight model.layers.15.self_attn.out_proj.bias model.layers.15.self_attn_layer_norm.weight model.layers.15.self_attn_layer_norm.bias model.layers.15.fc1.weight model.layers.15.fc1.bias model.layers.15.fc2.weight model.layers.15.fc2.bias model.layers.15.final_layer_norm.weight model.layers.15.final_layer_norm.bias model.layers.16.self_attn.k_proj.weight model.layers.16.self_attn.k_proj.bias model.layers.16.self_attn.v_proj.weight model.layers.16.self_attn.v_proj.bias model.layers.16.self_attn.q_proj.weight model.layers.16.self_attn.q_proj.bias model.layers.16.self_attn.out_proj.weight model.layers.16.self_attn.out_proj.bias model.layers.16.self_attn_layer_norm.weight model.layers.16.self_attn_layer_norm.bias model.layers.16.fc1.weight model.layers.16.fc1.bias model.layers.16.fc2.weight model.layers.16.fc2.bias model.layers.16.final_layer_norm.weight model.layers.16.final_layer_norm.bias model.layers.17.self_attn.k_proj.weight model.layers.17.self_attn.k_proj.bias model.layers.17.self_attn.v_proj.weight model.layers.17.self_attn.v_proj.bias model.layers.17.self_attn.q_proj.weight model.layers.17.self_attn.q_proj.bias model.layers.17.self_attn.out_proj.weight model.layers.17.self_attn.out_proj.bias model.layers.17.self_attn_layer_norm.weight model.layers.17.self_attn_layer_norm.bias model.layers.17.fc1.weight model.layers.17.fc1.bias model.layers.17.fc2.weight model.layers.17.fc2.bias model.layers.17.final_layer_norm.weight model.layers.17.final_layer_norm.bias model.layers.18.self_attn.k_proj.weight model.layers.18.self_attn.k_proj.bias model.layers.18.self_attn.v_proj.weight model.layers.18.self_attn.v_proj.bias model.layers.18.self_attn.q_proj.weight model.layers.18.self_attn.q_proj.bias model.layers.18.self_attn.out_proj.weight model.layers.18.self_attn.out_proj.bias model.layers.18.self_attn_layer_norm.weight model.layers.18.self_attn_layer_norm.bias model.layers.18.fc1.weight model.layers.18.fc1.bias model.layers.18.fc2.weight model.layers.18.fc2.bias model.layers.18.final_layer_norm.weight model.layers.18.final_layer_norm.bias model.layers.19.self_attn.k_proj.weight model.layers.19.self_attn.k_proj.bias model.layers.19.self_attn.v_proj.weight model.layers.19.self_attn.v_proj.bias model.layers.19.self_attn.q_proj.weight model.layers.19.self_attn.q_proj.bias model.layers.19.self_attn.out_proj.weight model.layers.19.self_attn.out_proj.bias model.layers.19.self_attn_layer_norm.weight model.layers.19.self_attn_layer_norm.bias model.layers.19.fc1.weight model.layers.19.fc1.bias model.layers.19.fc2.weight model.layers.19.fc2.bias model.layers.19.final_layer_norm.weight model.layers.19.final_layer_norm.bias model.layers.20.self_attn.k_proj.weight model.layers.20.self_attn.k_proj.bias model.layers.20.self_attn.v_proj.weight model.layers.20.self_attn.v_proj.bias model.layers.20.self_attn.q_proj.weight model.layers.20.self_attn.q_proj.bias model.layers.20.self_attn.out_proj.weight model.layers.20.self_attn.out_proj.bias model.layers.20.self_attn_layer_norm.weight model.layers.20.self_attn_layer_norm.bias model.layers.20.fc1.weight model.layers.20.fc1.bias model.layers.20.fc2.weight model.layers.20.fc2.bias model.layers.20.final_layer_norm.weight model.layers.20.final_layer_norm.bias model.layers.21.self_attn.k_proj.weight model.layers.21.self_attn.k_proj.bias model.layers.21.self_attn.v_proj.weight model.layers.21.self_attn.v_proj.bias model.layers.21.self_attn.q_proj.weight model.layers.21.self_attn.q_proj.bias model.layers.21.self_attn.out_proj.weight model.layers.21.self_attn.out_proj.bias model.layers.21.self_attn_layer_norm.weight model.layers.21.self_attn_layer_norm.bias model.layers.21.fc1.weight model.layers.21.fc1.bias model.layers.21.fc2.weight model.layers.21.fc2.bias model.layers.21.final_layer_norm.weight model.layers.21.final_layer_norm.bias model.layers.22.self_attn.k_proj.weight model.layers.22.self_attn.k_proj.bias model.layers.22.self_attn.v_proj.weight model.layers.22.self_attn.v_proj.bias model.layers.22.self_attn.q_proj.weight model.layers.22.self_attn.q_proj.bias model.layers.22.self_attn.out_proj.weight model.layers.22.self_attn.out_proj.bias model.layers.22.self_attn_layer_norm.weight model.layers.22.self_attn_layer_norm.bias model.layers.22.fc1.weight model.layers.22.fc1.bias model.layers.22.fc2.weight model.layers.22.fc2.bias model.layers.22.final_layer_norm.weight model.layers.22.final_layer_norm.bias model.layers.23.self_attn.k_proj.weight model.layers.23.self_attn.k_proj.bias model.layers.23.self_attn.v_proj.weight model.layers.23.self_attn.v_proj.bias model.layers.23.self_attn.q_proj.weight model.layers.23.self_attn.q_proj.bias model.layers.23.self_attn.out_proj.weight model.layers.23.self_attn.out_proj.bias model.layers.23.self_attn_layer_norm.weight model.layers.23.self_attn_layer_norm.bias model.layers.23.fc1.weight model.layers.23.fc1.bias model.layers.23.fc2.weight model.layers.23.fc2.bias model.layers.23.final_layer_norm.weight model.layers.23.final_layer_norm.bias model.layer_norm.weight model.layer_norm.bias lm_head.weight ```

This is the config.json file for the 564M model (taken from here):

```json { "activation_dropout": 0, "activation_function": "gelu", "architectures": [ "XGLMForCausalLM" ], "attention_dropout": 0.1, "attention_heads": 16, "bos_token_id": 0, "d_model": 1024, "decoder_start_token_id": 2, "dropout": 0.1, "eos_token_id": 2, "ffn_dim": 4096, "init_std": 0.02, "layerdrop": 0.0, "max_position_embeddings": 2048, "model_type": "xglm", "num_layers": 24, "pad_token_id": 1, "scale_embedding": true, "transformers_version": "4.16.0.dev0", "use_cache": true, "vocab_size": 256008 } ```

ggerganov / llama.cpp

Support for XGLM models #5097

Prerequisites

Feature Description

Motivation

Possible Implementation