Open pelinsuacar opened 6 months ago
Firstly, please strop from using yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth
from the HuggingFace Demo.
If you use the code in this repo, please use the pre-trained weights in this repo. The two versions have slight differences.
Secondly, the SimpleYOLOWorldDetector
is designed for prompt tuning
and re-parameterized version
, please check more in docs/reparameterize and docs/prompt_yolo_world,
So, should I use YOLOWorldPromptDetector instead? But there is no such detector in YOLO-World/yolo_world/models/detectors /yolo_world.py?
YOLOWorldPromptDetector
has been deprecated. Please use SimpleYOLOWorldDetector
instead. However, you might need to refer to: fine-tuning-yolo-world to determine.
okay I am a bit confused. Is it possible to test the zero shot inference with embedding instead of text as a first step? I just want to give the embedding of an object as input to get rid of the language model and to see if it will be able to detect that object in my target image. For this, I need to use the SimpleYOLOWorldDetector if I understand correctly. Because YoloWorldDetector has the language model but I couldn't find a way to initialize my SimpleYOLOWorldDetector with appopriate weights. Could you please explain if that's possible or not. Thank you!
so my question is where I can find 'pretrained_models/yolo_world_l_clip_t2i_bn_2e-3adamw_32xb16-100e_obj365v1_goldg_cc3mlite_train-ca93cd1f.pth' that is specified in one of the config files of SimpleYOLOWorldDetector?
Please check the following model zoo to check the pre-trained weights: https://github.com/AILab-CVC/YOLO-World?tab=readme-ov-file#zero-shot-inference-on-lvis-dataset
BTW, SimpleYOLOWorldDetector
is a general detector class and does not have a specific pre-trained weight.
after fine-tuning the reparametrized SimpleYOLOWorldDetector, how can I test zero shot inference? Is it possible to give both target image and an embedding of an object that is needed to be detected in the target image as inputs to my model during inference time? Since the precomputed text embeddings are converted into the weights of certain layers, there is no dynamic text embeddings during inference as it relies on the precomputed/integrated embeddings. So how we can say that "Reparameterized YOLO-World still has zero-shot ability" in this case? @wondervictor
Hello,
Could you provide the weights to try SimpleYOLOWorldDetector by giving image embedding as an input instead of text? When I load from "yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth", it gives an error:
Loads checkpoint by local backend from path: yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth The model and loaded state dict do not match exactly
unexpected key in source state_dict: backbone.text_model.model.text_model.embeddings.token_embedding.weight, backbone.text_model.model.text_model.embeddings.position_embedding.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.0.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.0.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.0.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.1.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.1.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.1.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.2.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.2.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.2.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.3.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.3.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.3.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.4.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.4.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.4.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.5.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.5.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.5.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.6.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.6.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.6.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.7.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.7.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.7.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.8.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.8.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.8.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.9.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.9.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.9.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.10.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.10.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.10.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.11.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.11.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.11.layer_norm2.bias, backbone.text_model.model.text_model.final_layer_norm.weight, backbone.text_model.model.text_model.final_layer_norm.bias, backbone.text_model.model.text_projection.weight
missing keys in source state_dict: embeddings
The model and loaded state dict do not match exactly
unexpected key in source state_dict: backbone.text_model.model.text_model.embeddings.token_embedding.weight, backbone.text_model.model.text_model.embeddings.position_embedding.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.0.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.0.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.0.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.0.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.0.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.0.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.1.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.1.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.1.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.1.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.1.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.1.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.2.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.2.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.2.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.2.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.2.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.2.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.3.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.3.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.3.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.3.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.3.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.3.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.4.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.4.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.4.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.4.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.4.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.4.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.5.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.5.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.5.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.5.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.5.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.5.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.6.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.6.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.6.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.6.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.6.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.6.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.7.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.7.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.7.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.7.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.7.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.7.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.8.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.8.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.8.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.8.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.8.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.8.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.9.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.9.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.9.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.9.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.9.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.9.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.10.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.10.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.10.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.10.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.10.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.10.layer_norm2.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.k_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.k_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.v_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.v_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.q_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.q_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.self_attn.out_proj.weight, backbone.text_model.model.text_model.encoder.layers.11.self_attn.out_proj.bias, backbone.text_model.model.text_model.encoder.layers.11.layer_norm1.weight, backbone.text_model.model.text_model.encoder.layers.11.layer_norm1.bias, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc1.weight, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc1.bias, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc2.weight, backbone.text_model.model.text_model.encoder.layers.11.mlp.fc2.bias, backbone.text_model.model.text_model.encoder.layers.11.layer_norm2.weight, backbone.text_model.model.text_model.encoder.layers.11.layer_norm2.bias, backbone.text_model.model.text_model.final_layer_norm.weight, backbone.text_model.model.text_model.final_layer_norm.bias, backbone.text_model.model.text_projection.weight
missing keys in source state_dict: embeddings