OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
779 stars 61 forks source link

Some weights of BertLMHeadModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized #19

Closed rrscholarship closed 7 months ago

rrscholarship commented 7 months ago

commend I run !python anygpt/src/infer/cli_infer_base_model.py \ --model-name-or-path AnyGPT-base \ --image-tokenizer-path models/seed-tokenizer-2/seed_quantizer.pt \ --speech-tokenizer-path models/speechtokenizer/ckpt.dev \ --speech-tokenizer-config models/speechtokenizer/config.json \ --soundstorm-path models/soundstorm/speechtokenizer_soundstorm_mls.pt \ --output-dir "infer_output/base"

Below is the error

NeMo-text-processing :: INFO :: Creating ClassifyFst grammars. Using device: cuda loading image tokenzier /home//.cache/torch/hub/checkpoints/eva_vit_g.pth INFO:root:freeze vision encoder Some weights of BertLMHeadModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['bert.encoder.layer.11.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.5.output_query.LayerNorm.bias', 'bert.encoder.layer.8.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.5.output_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.weight', 'bert.encoder.layer.7.output_query.LayerNorm.bias', 'bert.encoder.layer.7.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.LayerNorm.bias', 'bert.encoder.layer.11.output_query.dense.bias', 'bert.encoder.layer.1.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.bias', 'bert.encoder.layer.9.intermediate_query.dense.bias', 'bert.encoder.layer.11.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.3.output_query.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.10.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.8.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.4.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.intermediate_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.7.intermediate_query.dense.weight', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.9.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.output_query.LayerNorm.bias', 'bert.encoder.layer.3.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.8.output_query.LayerNorm.bias', 'bert.encoder.layer.4.output_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.1.output_query.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.0.output_query.LayerNorm.bias', 'bert.encoder.layer.9.output_query.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.7.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.output_query.dense.bias', 'bert.encoder.layer.0.intermediate_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.3.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.11.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.5.intermediate_query.dense.weight', 'bert.encoder.layer.1.output_query.dense.weight', 'bert.encoder.layer.5.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.11.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.weight', 'bert.encoder.layer.5.output_query.dense.bias', 'bert.encoder.layer.6.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.1.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.weight', 'bert.encoder.layer.4.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.4.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.bias', 'bert.encoder.layer.4.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.bias', 'bert.encoder.layer.7.output_query.dense.bias', 'bert.encoder.layer.3.intermediate_query.dense.bias', 'bert.encoder.layer.1.output_query.dense.bias', 'bert.encoder.layer.4.output_query.dense.weight', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.9.output_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.7.output_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.output_query.LayerNorm.weight', 'bert.encoder.layer.5.intermediate_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.8.output_query.dense.weight', 'bert.encoder.layer.6.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.10.intermediate_query.dense.weight', 'bert.encoder.layer.0.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.10.intermediate_query.dense.bias', 'bert.encoder.layer.8.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.11.output_query.LayerNorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. missing keys: 511 unexpected keys: 146 oading music tokenizer Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. loading audio tokenizer Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. loading llm

JunZhan2000 commented 7 months ago

No impact on model effectiveness