ggerganov / ggml

Tensor library for machine learning
MIT License
10.81k stars 998 forks source link

Feature Request: Support Cerebras BTLM #427

Open andersonbcdefg opened 1 year ago

andersonbcdefg commented 1 year ago

BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good to run locally. Doesn't quite fit any of the existing architectures because it's based on CerebrasGPT but also uses ALiBi. Blog here: https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/

HuggingFace model here: https://huggingface.co/cerebras/btlm-3b-8k-base

bornjre commented 1 year ago

i am trying to give it a go. i never ported any models before, so its new for me. but so far it looks fun. i have model conversion working i think HF refpo. (mostly bashed on convert-cerebras-to-ggml.py) i have couple questions

It would be nice if someone experienced told me in high level what is next.

transformer.h.0.attn.c_attn.weight (7680, 2560) float16
transformer.h.0.attn.c_attn.bias  (7680,) float32
transformer.h.0.attn.c_attn.SCB  (7680,) float32

MODEL

BTLMLMHeadModel(
  (transformer): BTLMModel(
    (wte): Embedding(50257, 2560)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-31): 32 x BTLMBlock(
        (ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (attn): BTLMAttention(
          (c_attn): Linear8bitLt(in_features=2560, out_features=7680, bias=True)
          (c_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (mlp): BTLMMLP(
          (c_fc): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_fc2): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_proj): Linear8bitLt(in_features=6826, out_features=2560, bias=True)
          (act): SwiGLUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
    (relative_pe): AlibiPositionEmbeddingLayer()
  )
  (lm_head): Linear(in_features=2560, out_features=50257, bias=False)
)

model loading cpp/wip impl file
https://huggingface.co/bornjre/btlm-3b-ggml/blob/main/btlm_model_wip.cpp

bornjre commented 1 year ago

Sorry for ping :smiley: @iboB @ggerganov

ggerganov commented 1 year ago

I'm not familiar with "SCB" tensors - you have to check how they are used in Python and understand their purpose

rskuzma commented 1 year ago

@bornjre, I think SCB tensors come from bitsandbytes (https://huggingface.co/blog/hf-bitsandbytes-integration, https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py), perhaps as a result of using load_in_8bit=True when loading the model in HF transformers? I don't think this is part of the original model

xloem commented 11 months ago

The python implementation of this model can be found at https://huggingface.co/cerebras/btlm-3b-8k-base/blob/main/modeling_btlm.py .

The SCB tensors are a result of huggingface-side quantization and would be converted as per any bitsandbytes quantized model, and can be ignored.

You can see the SCB tensors are not present in the model here:

$ curl -sL https://huggingface.co/cerebras/btlm-3b-8k-base/resolve/main/pytorch_model.bin | strings | grep 'transformer.h.0.attn'
transformer.h.0.attn.c_attn.weightq
transformer.h.0.attn.c_attn.biasq&h
transformer.h.0.attn.c_proj.weightq.h
transformer.h.0.attn.c_proj.biasq6h