According to the link you provided,The pretrained weight and Instruction tuning weight is same。
CLI Inference Guide got error:
Some weights of MPLUGOwl2LlamaForCausalLM were not initialized from the model checkpoint at MAGAer13/mplug-owl2-llama2-7b and are newly initialized: ['model.visual_abstractor.encoder.layers.3.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.k_pos_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
USER: exit...
According to the link you provided,The pretrained weight and Instruction tuning weight is same。
CLI Inference Guide got error: Some weights of MPLUGOwl2LlamaForCausalLM were not initialized from the model checkpoint at MAGAer13/mplug-owl2-llama2-7b and are newly initialized: ['model.visual_abstractor.encoder.layers.3.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.k_pos_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. USER: exit...