Open kimwongyuda opened 1 month ago
For the first question, this message is not significant and does not affect the training or evaluation results because RoPE does not contain any learnable parameters.
For the second question, you can handle prom_img_attention_mask
strictly according to prompt_embedding_output
. However, the current method produces equivalent results. This is because the mask only needs to identify the padding tokens at the end, and padding only occurs in the text portion. The number of image tokens is always constant.
I have two question for fine-tuning implementation.
self.model_visual, self.preprocess_train, self.preprocess_val= create_eva_vision_and_transforms( model_name_eva, eva_pretrained_path, force_custom_clip=True)
in modeling_ds_cirr.py, the missing_keys warning message occurs like below.[11-10-2024 14:13:40] INFO: incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin']
However, when I load eva_clip weight from VISTA_Evaluation_FineTuning/evaluation_example_webqa/BGE-base/modeling_evaluation_base.py, that message doesn't occur.
Why does the warning message occurs in the former unlike the latter?
If I fine-tune using pre-trained weight Visualized_base_en_v1.5.pth, I expect that the aforementioned issue doesn't matter because parameters of pre-trained weight fill all layers of model without gaps. Is my expectation right?
`
`