A keyerror is raised when using the FALCON 40B model converted by ctranslate2

srimouli04 commented 1 year ago

I have downloaded the model from HF and converted the Falcon-40b-instruct using ctranslate2. But when I try to run the model I get two errors

the model type is not present in config.json created by ctranslate2. I fixed this by manually editing the missing "model_type" parameter and it worked good for llama family models.
But when I try the same for Falcon models I get a key error : "RefinedWeb"

Any pointers on how this can be fixed?

guillaumekln commented 1 year ago

model_type should not appear in the file config.json created by CTranslate2. You don't need to add this parameter.

Can you post the exact steps to reproduce the error. In particular, what conversion command did you use?

srimouli04 commented 1 year ago

Hey @guillaumekln

This is the command I used the below command. I downloaded the model to my local, and then I used the model for conversion.

ct2-transformers-converter --model <model_path>/falcon-40b-instruct --quantization float16 --output_dir falcon-40b-instruct --trust_remote_code

But without model_type, the transformers.AutoTokenizer.from_pretrained throws a key error. which reads like

raise ValueError(
ValueError: Unrecognized model in <model_path>/falcon-40b-instruct. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, camembert, canine, chinese_clip, clap, clip, clipseg, codegen, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, deberta, deberta-v2, decision_transformer, deformable_detr, deit, deta, detr, dinat, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encoder-decoder, ernie, ernie_m, esm, flaubert, flava, fnet, focalnet, fsmt, funnel, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, graphormer, groupvit, hubert, ibert, imagegpt, informer, jukebox, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, longformer, longt5, luke, lxmert, m2m_100, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mt5, mvp, nat, nezha, nllb-moe, nystromformer, oneformer, open-llama, openai-gpt, opt, owlvit, pegasus, pegasus_x, perceiver, pix2struct, plbart, poolformer, prophetnet, qdqbert, rag, realm, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rwkv, sam, segformer, sew, sew-d, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, unispeech, unispeech-sat, upernet, van, videomae, vilt, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso

And this is the same code I used as described in the ctranslate2 documentation.

import ctranslate2
import transformers

generator = ctranslate2.Generator("<model_path>/falcon-40b-instruct", device="cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained("<model_path>/falcon-40b-instruct")

prompt = (
    "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. "
    "Giraftron believes all other animals are irrelevant when compared to the glorious majesty."
    "of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"
)

tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))

results = generator.generate_batch([tokens], sampling_topk=10, max_length=500, include_prompt_in_result=False)
output = tokenizer.decode(results[0].sequences_ids[0])

print(output)

jgcb00 commented 1 year ago

Hi @srimouli04, I am not sure you are doing it properly, first this line :

generator = ctranslate2.Generator("<model_path>/falcon-40b-instruct", device="cuda")

should be like this :

generator = ctranslate2.Generator("./falcon-40b-instruct", device="cuda")

The generator should be fill with the output_dir of the ct2-transformers-converter command. Not the default hugging face repo/download For the tokenizer, I think you can use :

tokenizer = transformers.AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

As the tokenizer will be the same as the instruct one, you can even fill it with "tiiuae/falcon-40b"

srimouli04 commented 1 year ago

Hi @jgcb00

Apologies for the confusion. I'm actually using the converted model path only, as you mentioned in your comment. But still, I have an issue that raises the ValueError if the model_Type isn't available in the config.json file. I have tried using the tokenizer by downloading it from HF and moving it to the directory which has the translated model.

guillaumekln commented 1 year ago

Can you delete the downloaded model and download it again from HF?

Also make sure to not set --output_dir to the same directory as the original model.

srimouli04 commented 1 year ago

Hi @guillaumekln

I have followed the steps you mentioned but I still face the same error.

guillaumekln commented 1 year ago

Can you post the content of the file config.json from the original model directory (i.e. <model_path>/falcon-40b-instruct in your conversion command)?

srimouli04 commented 1 year ago

This is the config file from the original model directory. This is the repo I have been using https://huggingface.co/tiiuae/falcon-40b-instruct/tree/main

{
  | "alibi": false,
  | "apply_residual_connection_post_layernorm": false,
  | "architectures": [
  | "RWForCausalLM"
  | ],
  | "attention_dropout": 0.0,
  | "auto_map": {
  | "AutoConfig": "configuration_RW.RWConfig",
  | "AutoModelForCausalLM": "modelling_RW.RWForCausalLM"
  | },
  | "bias": false,
  | "bos_token_id": 11,
  | "eos_token_id": 11,
  | "hidden_dropout": 0.0,
  | "hidden_size": 8192,
  | "initializer_range": 0.02,
  | "layer_norm_epsilon": 1e-05,
  | "model_type": "RefinedWeb",
  | "n_head": 128,
  | "n_head_kv": 8,
  | "n_layer": 60,
  | "parallel_attn": true,
  | "torch_dtype": "bfloat16",
  | "transformers_version": "4.26.0",
  | "use_cache": true,
  | "vocab_size": 65024
  | }

guillaumekln commented 1 year ago

Ok, I think I understand now what you are doing. You are loading the tokenizer from the converted model directory, but you should load it from the original model. Something like this:

generator = ctranslate2.Generator("/path/to/converted/model", device="cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained("/path/to/original/model")

OpenNMT / CTranslate2

A keyerror is raised when using the FALCON 40B model converted by ctranslate2 #1348