Open R4ZZ3 opened 1 year ago
Seems that my conversion does not work. Any ideas @ggerganov ?
You have to match the tensor names to the ones used by whisper.cpp
:
I tried converting the keys with the following script and conversion script runs smoothly:
new_checkpoint = {}
decoder
test_keys = []
i = 0
for key, value in checkpoint.items():
old_key = key
if 'decoder.' in key:
i +=1
print(i)
if 'model' in key:
key = key.replace('model.','')
if 'embed_positions' in key:
key = key.replace('embed_positions','positional_embedding')
key = key.replace('.weight','')
elif 'embed_tokens' in key:
key = key.replace('embed_tokens','token_embedding')
elif 'self_attn.k_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.k_proj.weight','attn.key.weight')
elif 'self_attn.v_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.weight','attn.value.weight')
elif 'self_attn.v_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.bias','attn.value.bias')
elif 'self_attn.q_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.weight','attn.query.weight')
elif 'self_attn.q_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.bias', 'attn.query.bias')
elif 'self_attn.out_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.weight','attn.out.weight')
elif 'self_attn.out_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.bias','attn.out.bias')
elif 'self_attn_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.weight','attn_ln.weight')
elif 'self_attn_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.bias', 'attn_ln.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.weight','mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif 'fc1.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.weight', 'mlp.0.weight')
elif 'fc1.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.bias', 'mlp.0.bias')
elif 'fc2.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.weight', 'mlp.2.weight')
elif 'fc2.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.bias', 'mlp.2.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.weight', 'mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif 'encoder_attn.k_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.k_proj.weight','cross_attn_ln.weight')
elif 'encoder_attn.v_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.v_proj.weight','cross_attn_ln.bias')
elif 'encoder_attn.v_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.v_proj.bias','cross_attn.query.weight')
elif 'encoder_attn.q_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.q_proj.weight','cross_attn.query.bias')
elif 'encoder_attn.q_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.q_proj.bias','cross_attn.key.weight')
elif 'encoder_attn.out_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.out_proj.weight','cross_attn.value.weight')
elif 'encoder_attn.out_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn.out_proj.bias','cross_attn.value.bias')
elif 'encoder_attn_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn_layer_norm.weight','cross_attn.out.weight')
elif 'encoder_attn_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('encoder_attn_layer_norm.bias','cross_attn.out.bias')
elif key.startswith('decoder.layer_norm'):
key = key.replace('layer_norm', 'ln')
elif 'encoder.' in key:
if 'model' in key:
key = key.replace('model.','')
if 'embed_positions' in key:
key = key.replace('embed_positions','positional_embedding')
key = key.replace('.weight','')
elif 'embed_tokens' in key:
key = key.replace('embed_tokens','token_embedding')
elif 'self_attn.k_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.k_proj.weight','attn.key.weight')
elif 'self_attn.v_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.weight','attn.value.weight')
elif 'self_attn.v_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.bias','attn.value.bias')
elif 'self_attn.q_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.weight','attn.query.weight')
elif 'self_attn.q_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.bias', 'attn.query.bias')
elif 'self_attn.out_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.weight','attn.out.weight')
elif 'self_attn.out_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.bias','attn.out.bias')
elif 'self_attn_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.weight','attn_ln.weight')
elif 'self_attn_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.bias', 'attn_ln.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.weight','mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif 'fc1.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.weight', 'mlp.0.weight')
elif 'fc1.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.bias', 'mlp.0.bias')
elif 'fc2.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.weight', 'mlp.2.weight')
elif 'fc2.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.bias', 'mlp.2.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.weight', 'mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif key.startswith('encoder.layer_norm'):
key = key.replace('layer_norm', 'ln_post')
# print(f'{old_key} --> {key}')
# if key not in key_list_in_whisper:
# print("KEY NOT FOUND")
new_checkpoint[key] = value
ENCODER
test_keys = []
i = 0
for key, value in checkpoint.items():
old_key = key
if 'encoder.' in key:
i +=1
print(i)
if 'model' in key:
key = key.replace('model.','')
if 'embed_positions' in key:
key = key.replace('embed_positions','positional_embedding')
key = key.replace('.weight','')
elif 'embed_tokens' in key:
key = key.replace('embed_tokens','token_embedding')
elif 'self_attn.k_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.k_proj.weight','attn.key.weight')
elif 'self_attn.v_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.weight','attn.value.weight')
elif 'self_attn.v_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.v_proj.bias','attn.value.bias')
elif 'self_attn.q_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.weight','attn.query.weight')
elif 'self_attn.q_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.q_proj.bias', 'attn.query.bias')
elif 'self_attn.out_proj.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.weight','attn.out.weight')
elif 'self_attn.out_proj.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn.out_proj.bias','attn.out.bias')
elif 'self_attn_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.weight','attn_ln.weight')
elif 'self_attn_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('self_attn_layer_norm.bias', 'attn_ln.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.weight','mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers', 'blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif 'fc1.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.weight', 'mlp.0.weight')
elif 'fc1.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc1.bias', 'mlp.0.bias')
elif 'fc2.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.weight', 'mlp.2.weight')
elif 'fc2.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('fc2.bias', 'mlp.2.bias')
elif 'final_layer_norm.weight' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.weight', 'mlp_ln.weight')
elif 'final_layer_norm.bias' in key:
key = key.replace('layers','blocks')
key = key.replace('final_layer_norm.bias', 'mlp_ln.bias')
elif key.startswith('encoder.layer_norm'):
key = key.replace('layer_norm', 'ln_post')
print(f'{old_key} --> {key}')
if key not in key_list_in_whisper:
print("KEY NOT FOUND")
small_dims = {'n_mels': 80,
'n_vocab': 51865,
'n_audio_ctx': 1500,
'n_audio_state': 768,
'n_audio_head': 12,
'n_audio_layer': 12,
'n_text_ctx': 448,
'n_text_state': 768,
'n_text_head': 12,
'n_text_layer': 12}
checkpoint['dims'] = small_dims`
object_with_dims_and_state_dict = {}
object_with_dims_and_state_dict['model_state_dict'] = new_checkpoint
object_with_dims_and_state_dict['dims'] = small_dims
torch.save(object_with_dims_and_state_dict, 'testaa.pt')
--> Then run conversion --> succeeds
--> run model on test sample
Then also @baya
Created a similar script here: https://colab.research.google.com/github/Vaibhavs10/notebooks/blob/main/transformers_whisper_ckpt_to_OAI.ipynb
Which I used but get the same result:
My conversion command: python models/convert-pt-to-ggml.py (path to conversion script) /mnt/f/Omat_opiskelut/whisper_transformaatio/whisper.cpp/openai_whisper/flozi00_whisper-small-german_OAI (path to model to convert) /mnt/f/Omat_opiskelut/whisper_transformaatio/whisper.cpp/openai_whisper/whisper (path to openai repo) ./models/testaa (path for outputting new model)
run command: ./main -m models/testaa/ggml-model.bin -f samples/jfk.wav
Maybe uncomment the following print to get some more info and try to debug:
Hi,
I made a small demo here https://huggingface.co/spaces/RASMUS/Whisper-youtube-crosslingual-subtitles which uses these models.
Now I am trying to convert models created during Huggingface Whisper finetuning event to be used with this implementation. I am not sure if I am doing this correctly but I would like to see more streamlined implementation directly in this repo as a script.
I started from this model: https://huggingface.co/ales/whisper-small-belarusian
Download the model with HF instructions
Save to disk and convert to .pt
Then when trying to transform:
Trying to add dims to dict object:
Running conversion again, model_state_dict error
Assume original checkpoint is actually state dict and creating new object:
Run conversion:
So now it succeeds. I have yet to test whether that actually works but I would like to have this kind of conversion directly in this repo :)