ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.33k stars 9.67k forks source link

Support for XGLM models #5097

Closed Stypox closed 7 months ago

Stypox commented 9 months ago

Prerequisites

Feature Description

Add support for facebook's XGLM models (e.g. xglm-564M), including converting them to gguf and then running them with llama.cpp. The HuggingFace implementation docs are here, while the models specs are here, for reference.

Motivation

XGLM models have good performance on specific tasks, despite their size. For example, the 564M model is small enough to run on any device, and can understand quite well the context of a sentence and extract specific parts (a useful use-case for an assistant). I could run xglm-564M using HuggingFace's framework from within termux, however it doesn't support (efficient) quantization on CPU, and so the model ends up using 3 GB of RAM and being quite slow. Also, I would like to embed an XGLM model in an Android app, and doing so with llama.cpp would be much simpler (and more efficient) than packaging python, transformers and all other dependencies of the HuggingFace implementation.

Possible Implementation

The convert-hf-to-gguf.py script should add support for XGLMForCausalLM and probably something more needs to be implemented in llama.cpp if some new layer type needs to be implemented. I would be open to help with implementation, however I don't know much about LLM architecture in general, and about the llama.cpp project specifically. I looked at one recently merged PR that added support for a new model structure, but couldn't really understand what was going on. Do you have some documentation on how to add new model types? Would it be simple to add XGLM or does it have a nonstandard architecture?

These are the layers in the XGLM-564M model: ``` model.embed_tokens.weight model.layers.0.self_attn.k_proj.weight model.layers.0.self_attn.k_proj.bias model.layers.0.self_attn.v_proj.weight model.layers.0.self_attn.v_proj.bias model.layers.0.self_attn.q_proj.weight model.layers.0.self_attn.q_proj.bias model.layers.0.self_attn.out_proj.weight model.layers.0.self_attn.out_proj.bias model.layers.0.self_attn_layer_norm.weight model.layers.0.self_attn_layer_norm.bias model.layers.0.fc1.weight model.layers.0.fc1.bias model.layers.0.fc2.weight model.layers.0.fc2.bias model.layers.0.final_layer_norm.weight model.layers.0.final_layer_norm.bias model.layers.1.self_attn.k_proj.weight model.layers.1.self_attn.k_proj.bias model.layers.1.self_attn.v_proj.weight model.layers.1.self_attn.v_proj.bias model.layers.1.self_attn.q_proj.weight model.layers.1.self_attn.q_proj.bias model.layers.1.self_attn.out_proj.weight model.layers.1.self_attn.out_proj.bias model.layers.1.self_attn_layer_norm.weight model.layers.1.self_attn_layer_norm.bias model.layers.1.fc1.weight model.layers.1.fc1.bias model.layers.1.fc2.weight model.layers.1.fc2.bias model.layers.1.final_layer_norm.weight model.layers.1.final_layer_norm.bias model.layers.2.self_attn.k_proj.weight model.layers.2.self_attn.k_proj.bias model.layers.2.self_attn.v_proj.weight model.layers.2.self_attn.v_proj.bias model.layers.2.self_attn.q_proj.weight model.layers.2.self_attn.q_proj.bias model.layers.2.self_attn.out_proj.weight model.layers.2.self_attn.out_proj.bias model.layers.2.self_attn_layer_norm.weight model.layers.2.self_attn_layer_norm.bias model.layers.2.fc1.weight model.layers.2.fc1.bias model.layers.2.fc2.weight model.layers.2.fc2.bias model.layers.2.final_layer_norm.weight model.layers.2.final_layer_norm.bias model.layers.3.self_attn.k_proj.weight model.layers.3.self_attn.k_proj.bias model.layers.3.self_attn.v_proj.weight model.layers.3.self_attn.v_proj.bias model.layers.3.self_attn.q_proj.weight model.layers.3.self_attn.q_proj.bias model.layers.3.self_attn.out_proj.weight model.layers.3.self_attn.out_proj.bias model.layers.3.self_attn_layer_norm.weight model.layers.3.self_attn_layer_norm.bias model.layers.3.fc1.weight model.layers.3.fc1.bias model.layers.3.fc2.weight model.layers.3.fc2.bias model.layers.3.final_layer_norm.weight model.layers.3.final_layer_norm.bias model.layers.4.self_attn.k_proj.weight model.layers.4.self_attn.k_proj.bias model.layers.4.self_attn.v_proj.weight model.layers.4.self_attn.v_proj.bias model.layers.4.self_attn.q_proj.weight model.layers.4.self_attn.q_proj.bias model.layers.4.self_attn.out_proj.weight model.layers.4.self_attn.out_proj.bias model.layers.4.self_attn_layer_norm.weight model.layers.4.self_attn_layer_norm.bias model.layers.4.fc1.weight model.layers.4.fc1.bias model.layers.4.fc2.weight model.layers.4.fc2.bias model.layers.4.final_layer_norm.weight model.layers.4.final_layer_norm.bias model.layers.5.self_attn.k_proj.weight model.layers.5.self_attn.k_proj.bias model.layers.5.self_attn.v_proj.weight model.layers.5.self_attn.v_proj.bias model.layers.5.self_attn.q_proj.weight model.layers.5.self_attn.q_proj.bias model.layers.5.self_attn.out_proj.weight model.layers.5.self_attn.out_proj.bias model.layers.5.self_attn_layer_norm.weight model.layers.5.self_attn_layer_norm.bias model.layers.5.fc1.weight model.layers.5.fc1.bias model.layers.5.fc2.weight model.layers.5.fc2.bias model.layers.5.final_layer_norm.weight model.layers.5.final_layer_norm.bias model.layers.6.self_attn.k_proj.weight model.layers.6.self_attn.k_proj.bias model.layers.6.self_attn.v_proj.weight model.layers.6.self_attn.v_proj.bias model.layers.6.self_attn.q_proj.weight model.layers.6.self_attn.q_proj.bias model.layers.6.self_attn.out_proj.weight model.layers.6.self_attn.out_proj.bias model.layers.6.self_attn_layer_norm.weight model.layers.6.self_attn_layer_norm.bias model.layers.6.fc1.weight model.layers.6.fc1.bias model.layers.6.fc2.weight model.layers.6.fc2.bias model.layers.6.final_layer_norm.weight model.layers.6.final_layer_norm.bias model.layers.7.self_attn.k_proj.weight model.layers.7.self_attn.k_proj.bias model.layers.7.self_attn.v_proj.weight model.layers.7.self_attn.v_proj.bias model.layers.7.self_attn.q_proj.weight model.layers.7.self_attn.q_proj.bias model.layers.7.self_attn.out_proj.weight model.layers.7.self_attn.out_proj.bias model.layers.7.self_attn_layer_norm.weight model.layers.7.self_attn_layer_norm.bias model.layers.7.fc1.weight model.layers.7.fc1.bias model.layers.7.fc2.weight model.layers.7.fc2.bias model.layers.7.final_layer_norm.weight model.layers.7.final_layer_norm.bias model.layers.8.self_attn.k_proj.weight model.layers.8.self_attn.k_proj.bias model.layers.8.self_attn.v_proj.weight model.layers.8.self_attn.v_proj.bias model.layers.8.self_attn.q_proj.weight model.layers.8.self_attn.q_proj.bias model.layers.8.self_attn.out_proj.weight model.layers.8.self_attn.out_proj.bias model.layers.8.self_attn_layer_norm.weight model.layers.8.self_attn_layer_norm.bias model.layers.8.fc1.weight model.layers.8.fc1.bias model.layers.8.fc2.weight model.layers.8.fc2.bias model.layers.8.final_layer_norm.weight model.layers.8.final_layer_norm.bias model.layers.9.self_attn.k_proj.weight model.layers.9.self_attn.k_proj.bias model.layers.9.self_attn.v_proj.weight model.layers.9.self_attn.v_proj.bias model.layers.9.self_attn.q_proj.weight model.layers.9.self_attn.q_proj.bias model.layers.9.self_attn.out_proj.weight model.layers.9.self_attn.out_proj.bias model.layers.9.self_attn_layer_norm.weight model.layers.9.self_attn_layer_norm.bias model.layers.9.fc1.weight model.layers.9.fc1.bias model.layers.9.fc2.weight model.layers.9.fc2.bias model.layers.9.final_layer_norm.weight model.layers.9.final_layer_norm.bias model.layers.10.self_attn.k_proj.weight model.layers.10.self_attn.k_proj.bias model.layers.10.self_attn.v_proj.weight model.layers.10.self_attn.v_proj.bias model.layers.10.self_attn.q_proj.weight model.layers.10.self_attn.q_proj.bias model.layers.10.self_attn.out_proj.weight model.layers.10.self_attn.out_proj.bias model.layers.10.self_attn_layer_norm.weight model.layers.10.self_attn_layer_norm.bias model.layers.10.fc1.weight model.layers.10.fc1.bias model.layers.10.fc2.weight model.layers.10.fc2.bias model.layers.10.final_layer_norm.weight model.layers.10.final_layer_norm.bias model.layers.11.self_attn.k_proj.weight model.layers.11.self_attn.k_proj.bias model.layers.11.self_attn.v_proj.weight model.layers.11.self_attn.v_proj.bias model.layers.11.self_attn.q_proj.weight model.layers.11.self_attn.q_proj.bias model.layers.11.self_attn.out_proj.weight model.layers.11.self_attn.out_proj.bias model.layers.11.self_attn_layer_norm.weight model.layers.11.self_attn_layer_norm.bias model.layers.11.fc1.weight model.layers.11.fc1.bias model.layers.11.fc2.weight model.layers.11.fc2.bias model.layers.11.final_layer_norm.weight model.layers.11.final_layer_norm.bias model.layers.12.self_attn.k_proj.weight model.layers.12.self_attn.k_proj.bias model.layers.12.self_attn.v_proj.weight model.layers.12.self_attn.v_proj.bias model.layers.12.self_attn.q_proj.weight model.layers.12.self_attn.q_proj.bias model.layers.12.self_attn.out_proj.weight model.layers.12.self_attn.out_proj.bias model.layers.12.self_attn_layer_norm.weight model.layers.12.self_attn_layer_norm.bias model.layers.12.fc1.weight model.layers.12.fc1.bias model.layers.12.fc2.weight model.layers.12.fc2.bias model.layers.12.final_layer_norm.weight model.layers.12.final_layer_norm.bias model.layers.13.self_attn.k_proj.weight model.layers.13.self_attn.k_proj.bias model.layers.13.self_attn.v_proj.weight model.layers.13.self_attn.v_proj.bias model.layers.13.self_attn.q_proj.weight model.layers.13.self_attn.q_proj.bias model.layers.13.self_attn.out_proj.weight model.layers.13.self_attn.out_proj.bias model.layers.13.self_attn_layer_norm.weight model.layers.13.self_attn_layer_norm.bias model.layers.13.fc1.weight model.layers.13.fc1.bias model.layers.13.fc2.weight model.layers.13.fc2.bias model.layers.13.final_layer_norm.weight model.layers.13.final_layer_norm.bias model.layers.14.self_attn.k_proj.weight model.layers.14.self_attn.k_proj.bias model.layers.14.self_attn.v_proj.weight model.layers.14.self_attn.v_proj.bias model.layers.14.self_attn.q_proj.weight model.layers.14.self_attn.q_proj.bias model.layers.14.self_attn.out_proj.weight model.layers.14.self_attn.out_proj.bias model.layers.14.self_attn_layer_norm.weight model.layers.14.self_attn_layer_norm.bias model.layers.14.fc1.weight model.layers.14.fc1.bias model.layers.14.fc2.weight model.layers.14.fc2.bias model.layers.14.final_layer_norm.weight model.layers.14.final_layer_norm.bias model.layers.15.self_attn.k_proj.weight model.layers.15.self_attn.k_proj.bias model.layers.15.self_attn.v_proj.weight model.layers.15.self_attn.v_proj.bias model.layers.15.self_attn.q_proj.weight model.layers.15.self_attn.q_proj.bias model.layers.15.self_attn.out_proj.weight model.layers.15.self_attn.out_proj.bias model.layers.15.self_attn_layer_norm.weight model.layers.15.self_attn_layer_norm.bias model.layers.15.fc1.weight model.layers.15.fc1.bias model.layers.15.fc2.weight model.layers.15.fc2.bias model.layers.15.final_layer_norm.weight model.layers.15.final_layer_norm.bias model.layers.16.self_attn.k_proj.weight model.layers.16.self_attn.k_proj.bias model.layers.16.self_attn.v_proj.weight model.layers.16.self_attn.v_proj.bias model.layers.16.self_attn.q_proj.weight model.layers.16.self_attn.q_proj.bias model.layers.16.self_attn.out_proj.weight model.layers.16.self_attn.out_proj.bias model.layers.16.self_attn_layer_norm.weight model.layers.16.self_attn_layer_norm.bias model.layers.16.fc1.weight model.layers.16.fc1.bias model.layers.16.fc2.weight model.layers.16.fc2.bias model.layers.16.final_layer_norm.weight model.layers.16.final_layer_norm.bias model.layers.17.self_attn.k_proj.weight model.layers.17.self_attn.k_proj.bias model.layers.17.self_attn.v_proj.weight model.layers.17.self_attn.v_proj.bias model.layers.17.self_attn.q_proj.weight model.layers.17.self_attn.q_proj.bias model.layers.17.self_attn.out_proj.weight model.layers.17.self_attn.out_proj.bias model.layers.17.self_attn_layer_norm.weight model.layers.17.self_attn_layer_norm.bias model.layers.17.fc1.weight model.layers.17.fc1.bias model.layers.17.fc2.weight model.layers.17.fc2.bias model.layers.17.final_layer_norm.weight model.layers.17.final_layer_norm.bias model.layers.18.self_attn.k_proj.weight model.layers.18.self_attn.k_proj.bias model.layers.18.self_attn.v_proj.weight model.layers.18.self_attn.v_proj.bias model.layers.18.self_attn.q_proj.weight model.layers.18.self_attn.q_proj.bias model.layers.18.self_attn.out_proj.weight model.layers.18.self_attn.out_proj.bias model.layers.18.self_attn_layer_norm.weight model.layers.18.self_attn_layer_norm.bias model.layers.18.fc1.weight model.layers.18.fc1.bias model.layers.18.fc2.weight model.layers.18.fc2.bias model.layers.18.final_layer_norm.weight model.layers.18.final_layer_norm.bias model.layers.19.self_attn.k_proj.weight model.layers.19.self_attn.k_proj.bias model.layers.19.self_attn.v_proj.weight model.layers.19.self_attn.v_proj.bias model.layers.19.self_attn.q_proj.weight model.layers.19.self_attn.q_proj.bias model.layers.19.self_attn.out_proj.weight model.layers.19.self_attn.out_proj.bias model.layers.19.self_attn_layer_norm.weight model.layers.19.self_attn_layer_norm.bias model.layers.19.fc1.weight model.layers.19.fc1.bias model.layers.19.fc2.weight model.layers.19.fc2.bias model.layers.19.final_layer_norm.weight model.layers.19.final_layer_norm.bias model.layers.20.self_attn.k_proj.weight model.layers.20.self_attn.k_proj.bias model.layers.20.self_attn.v_proj.weight model.layers.20.self_attn.v_proj.bias model.layers.20.self_attn.q_proj.weight model.layers.20.self_attn.q_proj.bias model.layers.20.self_attn.out_proj.weight model.layers.20.self_attn.out_proj.bias model.layers.20.self_attn_layer_norm.weight model.layers.20.self_attn_layer_norm.bias model.layers.20.fc1.weight model.layers.20.fc1.bias model.layers.20.fc2.weight model.layers.20.fc2.bias model.layers.20.final_layer_norm.weight model.layers.20.final_layer_norm.bias model.layers.21.self_attn.k_proj.weight model.layers.21.self_attn.k_proj.bias model.layers.21.self_attn.v_proj.weight model.layers.21.self_attn.v_proj.bias model.layers.21.self_attn.q_proj.weight model.layers.21.self_attn.q_proj.bias model.layers.21.self_attn.out_proj.weight model.layers.21.self_attn.out_proj.bias model.layers.21.self_attn_layer_norm.weight model.layers.21.self_attn_layer_norm.bias model.layers.21.fc1.weight model.layers.21.fc1.bias model.layers.21.fc2.weight model.layers.21.fc2.bias model.layers.21.final_layer_norm.weight model.layers.21.final_layer_norm.bias model.layers.22.self_attn.k_proj.weight model.layers.22.self_attn.k_proj.bias model.layers.22.self_attn.v_proj.weight model.layers.22.self_attn.v_proj.bias model.layers.22.self_attn.q_proj.weight model.layers.22.self_attn.q_proj.bias model.layers.22.self_attn.out_proj.weight model.layers.22.self_attn.out_proj.bias model.layers.22.self_attn_layer_norm.weight model.layers.22.self_attn_layer_norm.bias model.layers.22.fc1.weight model.layers.22.fc1.bias model.layers.22.fc2.weight model.layers.22.fc2.bias model.layers.22.final_layer_norm.weight model.layers.22.final_layer_norm.bias model.layers.23.self_attn.k_proj.weight model.layers.23.self_attn.k_proj.bias model.layers.23.self_attn.v_proj.weight model.layers.23.self_attn.v_proj.bias model.layers.23.self_attn.q_proj.weight model.layers.23.self_attn.q_proj.bias model.layers.23.self_attn.out_proj.weight model.layers.23.self_attn.out_proj.bias model.layers.23.self_attn_layer_norm.weight model.layers.23.self_attn_layer_norm.bias model.layers.23.fc1.weight model.layers.23.fc1.bias model.layers.23.fc2.weight model.layers.23.fc2.bias model.layers.23.final_layer_norm.weight model.layers.23.final_layer_norm.bias model.layer_norm.weight model.layer_norm.bias lm_head.weight ```
This is the config.json file for the 564M model (taken from here): ```json { "activation_dropout": 0, "activation_function": "gelu", "architectures": [ "XGLMForCausalLM" ], "attention_dropout": 0.1, "attention_heads": 16, "bos_token_id": 0, "d_model": 1024, "decoder_start_token_id": 2, "dropout": 0.1, "eos_token_id": 2, "ffn_dim": 4096, "init_std": 0.02, "layerdrop": 0.0, "max_position_embeddings": 2048, "model_type": "xglm", "num_layers": 24, "pad_token_id": 1, "scale_embedding": true, "transformers_version": "4.16.0.dev0", "use_cache": true, "vocab_size": 256008 } ```
github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.