TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models
https://arxiv.org/abs/2402.14289
Apache License 2.0
658 stars 68 forks source link

--tune_type_llm set to full but LLM parameters are not trainable #131

Closed YenCheHsiao closed 1 week ago

YenCheHsiao commented 1 week ago

In the scripts/train/finetune.sh file, --tune_type_llm is set to full. I expect that all parameters in the language model would be trainable. However, the output file indicates that only parameters in the connector are trainable, while all LLM parameters remain non-trainable.

Could you please clarify why the LLM parameters are not set as trainable when --tune_type_llm is configured to full? Based on the output, it appears only the connector parameters are trainable, while the LLM parameters remain at zero, which seems unexpected with the full setting.

Output:


2024-11-10 11:13:11,825 | INFO: Total Parameters: 9507840, Total Trainable Parameters: 9507840
2024-11-10 11:13:11,826 | INFO: Trainable Parameters:
language_model.model.embed_tokens.weight: 0 parameters
language_model.model.layers.0.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.0.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.0.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.0.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.0.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.0.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.0.self_attn.dense.weight: 0 parameters
language_model.model.layers.0.self_attn.dense.bias: 0 parameters
language_model.model.layers.0.mlp.fc1.weight: 0 parameters
language_model.model.layers.0.mlp.fc1.bias: 0 parameters
language_model.model.layers.0.mlp.fc2.weight: 0 parameters
language_model.model.layers.0.mlp.fc2.bias: 0 parameters
language_model.model.layers.0.input_layernorm.weight: 0 parameters
language_model.model.layers.0.input_layernorm.bias: 0 parameters
language_model.model.layers.1.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.1.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.1.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.1.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.1.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.1.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.1.self_attn.dense.weight: 0 parameters
language_model.model.layers.1.self_attn.dense.bias: 0 parameters
language_model.model.layers.1.mlp.fc1.weight: 0 parameters
language_model.model.layers.1.mlp.fc1.bias: 0 parameters
language_model.model.layers.1.mlp.fc2.weight: 0 parameters
language_model.model.layers.1.mlp.fc2.bias: 0 parameters
language_model.model.layers.1.input_layernorm.weight: 0 parameters
language_model.model.layers.1.input_layernorm.bias: 0 parameters
language_model.model.layers.2.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.2.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.2.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.2.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.2.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.2.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.2.self_attn.dense.weight: 0 parameters
language_model.model.layers.2.self_attn.dense.bias: 0 parameters
language_model.model.layers.2.mlp.fc1.weight: 0 parameters
language_model.model.layers.2.mlp.fc1.bias: 0 parameters
language_model.model.layers.2.mlp.fc2.weight: 0 parameters
language_model.model.layers.2.mlp.fc2.bias: 0 parameters
language_model.model.layers.2.input_layernorm.weight: 0 parameters
language_model.model.layers.2.input_layernorm.bias: 0 parameters
language_model.model.layers.3.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.3.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.3.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.3.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.3.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.3.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.3.self_attn.dense.weight: 0 parameters
language_model.model.layers.3.self_attn.dense.bias: 0 parameters
language_model.model.layers.3.mlp.fc1.weight: 0 parameters
language_model.model.layers.3.mlp.fc1.bias: 0 parameters
language_model.model.layers.3.mlp.fc2.weight: 0 parameters
language_model.model.layers.3.mlp.fc2.bias: 0 parameters
language_model.model.layers.3.input_layernorm.weight: 0 parameters
language_model.model.layers.3.input_layernorm.bias: 0 parameters
language_model.model.layers.4.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.4.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.4.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.4.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.4.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.4.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.4.self_attn.dense.weight: 0 parameters
language_model.model.layers.4.self_attn.dense.bias: 0 parameters
language_model.model.layers.4.mlp.fc1.weight: 0 parameters
language_model.model.layers.4.mlp.fc1.bias: 0 parameters
language_model.model.layers.4.mlp.fc2.weight: 0 parameters
language_model.model.layers.4.mlp.fc2.bias: 0 parameters
language_model.model.layers.4.input_layernorm.weight: 0 parameters
language_model.model.layers.4.input_layernorm.bias: 0 parameters
language_model.model.layers.5.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.5.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.5.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.5.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.5.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.5.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.5.self_attn.dense.weight: 0 parameters
language_model.model.layers.5.self_attn.dense.bias: 0 parameters
language_model.model.layers.5.mlp.fc1.weight: 0 parameters
language_model.model.layers.5.mlp.fc1.bias: 0 parameters
language_model.model.layers.5.mlp.fc2.weight: 0 parameters
language_model.model.layers.5.mlp.fc2.bias: 0 parameters
language_model.model.layers.5.input_layernorm.weight: 0 parameters
language_model.model.layers.5.input_layernorm.bias: 0 parameters
language_model.model.layers.6.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.6.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.6.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.6.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.6.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.6.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.6.self_attn.dense.weight: 0 parameters
language_model.model.layers.6.self_attn.dense.bias: 0 parameters
language_model.model.layers.6.mlp.fc1.weight: 0 parameters
language_model.model.layers.6.mlp.fc1.bias: 0 parameters
language_model.model.layers.6.mlp.fc2.weight: 0 parameters
language_model.model.layers.6.mlp.fc2.bias: 0 parameters
language_model.model.layers.6.input_layernorm.weight: 0 parameters
language_model.model.layers.6.input_layernorm.bias: 0 parameters
language_model.model.layers.7.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.7.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.7.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.7.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.7.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.7.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.7.self_attn.dense.weight: 0 parameters
language_model.model.layers.7.self_attn.dense.bias: 0 parameters
language_model.model.layers.7.mlp.fc1.weight: 0 parameters
language_model.model.layers.7.mlp.fc1.bias: 0 parameters
language_model.model.layers.7.mlp.fc2.weight: 0 parameters
language_model.model.layers.7.mlp.fc2.bias: 0 parameters
language_model.model.layers.7.input_layernorm.weight: 0 parameters
language_model.model.layers.7.input_layernorm.bias: 0 parameters
language_model.model.layers.8.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.8.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.8.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.8.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.8.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.8.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.8.self_attn.dense.weight: 0 parameters
language_model.model.layers.8.self_attn.dense.bias: 0 parameters
language_model.model.layers.8.mlp.fc1.weight: 0 parameters
language_model.model.layers.8.mlp.fc1.bias: 0 parameters
language_model.model.layers.8.mlp.fc2.weight: 0 parameters
language_model.model.layers.8.mlp.fc2.bias: 0 parameters
language_model.model.layers.8.input_layernorm.weight: 0 parameters
language_model.model.layers.8.input_layernorm.bias: 0 parameters
language_model.model.layers.9.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.9.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.9.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.9.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.9.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.9.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.9.self_attn.dense.weight: 0 parameters
language_model.model.layers.9.self_attn.dense.bias: 0 parameters
language_model.model.layers.9.mlp.fc1.weight: 0 parameters
language_model.model.layers.9.mlp.fc1.bias: 0 parameters
language_model.model.layers.9.mlp.fc2.weight: 0 parameters
language_model.model.layers.9.mlp.fc2.bias: 0 parameters
language_model.model.layers.9.input_layernorm.weight: 0 parameters
language_model.model.layers.9.input_layernorm.bias: 0 parameters
language_model.model.layers.10.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.10.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.10.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.10.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.10.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.10.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.10.self_attn.dense.weight: 0 parameters
language_model.model.layers.10.self_attn.dense.bias: 0 parameters
language_model.model.layers.10.mlp.fc1.weight: 0 parameters
language_model.model.layers.10.mlp.fc1.bias: 0 parameters
language_model.model.layers.10.mlp.fc2.weight: 0 parameters
language_model.model.layers.10.mlp.fc2.bias: 0 parameters
language_model.model.layers.10.input_layernorm.weight: 0 parameters
language_model.model.layers.10.input_layernorm.bias: 0 parameters
language_model.model.layers.11.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.11.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.11.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.11.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.11.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.11.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.11.self_attn.dense.weight: 0 parameters
language_model.model.layers.11.self_attn.dense.bias: 0 parameters
language_model.model.layers.11.mlp.fc1.weight: 0 parameters
language_model.model.layers.11.mlp.fc1.bias: 0 parameters
language_model.model.layers.11.mlp.fc2.weight: 0 parameters
language_model.model.layers.11.mlp.fc2.bias: 0 parameters
language_model.model.layers.11.input_layernorm.weight: 0 parameters
language_model.model.layers.11.input_layernorm.bias: 0 parameters
language_model.model.layers.12.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.12.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.12.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.12.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.12.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.12.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.12.self_attn.dense.weight: 0 parameters
language_model.model.layers.12.self_attn.dense.bias: 0 parameters
language_model.model.layers.12.mlp.fc1.weight: 0 parameters
language_model.model.layers.12.mlp.fc1.bias: 0 parameters
language_model.model.layers.12.mlp.fc2.weight: 0 parameters
language_model.model.layers.12.mlp.fc2.bias: 0 parameters
language_model.model.layers.12.input_layernorm.weight: 0 parameters
language_model.model.layers.12.input_layernorm.bias: 0 parameters
language_model.model.layers.13.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.13.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.13.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.13.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.13.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.13.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.13.self_attn.dense.weight: 0 parameters
language_model.model.layers.13.self_attn.dense.bias: 0 parameters
language_model.model.layers.13.mlp.fc1.weight: 0 parameters
language_model.model.layers.13.mlp.fc1.bias: 0 parameters
language_model.model.layers.13.mlp.fc2.weight: 0 parameters
language_model.model.layers.13.mlp.fc2.bias: 0 parameters
language_model.model.layers.13.input_layernorm.weight: 0 parameters
language_model.model.layers.13.input_layernorm.bias: 0 parameters
language_model.model.layers.14.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.14.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.14.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.14.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.14.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.14.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.14.self_attn.dense.weight: 0 parameters
language_model.model.layers.14.self_attn.dense.bias: 0 parameters
language_model.model.layers.14.mlp.fc1.weight: 0 parameters
language_model.model.layers.14.mlp.fc1.bias: 0 parameters
language_model.model.layers.14.mlp.fc2.weight: 0 parameters
language_model.model.layers.14.mlp.fc2.bias: 0 parameters
language_model.model.layers.14.input_layernorm.weight: 0 parameters
language_model.model.layers.14.input_layernorm.bias: 0 parameters
language_model.model.layers.15.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.15.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.15.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.15.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.15.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.15.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.15.self_attn.dense.weight: 0 parameters
language_model.model.layers.15.self_attn.dense.bias: 0 parameters
language_model.model.layers.15.mlp.fc1.weight: 0 parameters
language_model.model.layers.15.mlp.fc1.bias: 0 parameters
language_model.model.layers.15.mlp.fc2.weight: 0 parameters
language_model.model.layers.15.mlp.fc2.bias: 0 parameters
language_model.model.layers.15.input_layernorm.weight: 0 parameters
language_model.model.layers.15.input_layernorm.bias: 0 parameters
language_model.model.layers.16.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.16.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.16.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.16.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.16.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.16.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.16.self_attn.dense.weight: 0 parameters
language_model.model.layers.16.self_attn.dense.bias: 0 parameters
language_model.model.layers.16.mlp.fc1.weight: 0 parameters
language_model.model.layers.16.mlp.fc1.bias: 0 parameters
language_model.model.layers.16.mlp.fc2.weight: 0 parameters
language_model.model.layers.16.mlp.fc2.bias: 0 parameters
language_model.model.layers.16.input_layernorm.weight: 0 parameters
language_model.model.layers.16.input_layernorm.bias: 0 parameters
language_model.model.layers.17.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.17.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.17.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.17.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.17.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.17.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.17.self_attn.dense.weight: 0 parameters
language_model.model.layers.17.self_attn.dense.bias: 0 parameters
language_model.model.layers.17.mlp.fc1.weight: 0 parameters
language_model.model.layers.17.mlp.fc1.bias: 0 parameters
language_model.model.layers.17.mlp.fc2.weight: 0 parameters
language_model.model.layers.17.mlp.fc2.bias: 0 parameters
language_model.model.layers.17.input_layernorm.weight: 0 parameters
language_model.model.layers.17.input_layernorm.bias: 0 parameters
language_model.model.layers.18.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.18.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.18.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.18.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.18.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.18.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.18.self_attn.dense.weight: 0 parameters
language_model.model.layers.18.self_attn.dense.bias: 0 parameters
language_model.model.layers.18.mlp.fc1.weight: 0 parameters
language_model.model.layers.18.mlp.fc1.bias: 0 parameters
language_model.model.layers.18.mlp.fc2.weight: 0 parameters
language_model.model.layers.18.mlp.fc2.bias: 0 parameters
language_model.model.layers.18.input_layernorm.weight: 0 parameters
language_model.model.layers.18.input_layernorm.bias: 0 parameters
language_model.model.layers.19.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.19.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.19.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.19.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.19.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.19.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.19.self_attn.dense.weight: 0 parameters
language_model.model.layers.19.self_attn.dense.bias: 0 parameters
language_model.model.layers.19.mlp.fc1.weight: 0 parameters
language_model.model.layers.19.mlp.fc1.bias: 0 parameters
language_model.model.layers.19.mlp.fc2.weight: 0 parameters
language_model.model.layers.19.mlp.fc2.bias: 0 parameters
language_model.model.layers.19.input_layernorm.weight: 0 parameters
language_model.model.layers.19.input_layernorm.bias: 0 parameters
language_model.model.layers.20.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.20.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.20.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.20.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.20.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.20.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.20.self_attn.dense.weight: 0 parameters
language_model.model.layers.20.self_attn.dense.bias: 0 parameters
language_model.model.layers.20.mlp.fc1.weight: 0 parameters
language_model.model.layers.20.mlp.fc1.bias: 0 parameters
language_model.model.layers.20.mlp.fc2.weight: 0 parameters
language_model.model.layers.20.mlp.fc2.bias: 0 parameters
language_model.model.layers.20.input_layernorm.weight: 0 parameters
language_model.model.layers.20.input_layernorm.bias: 0 parameters
language_model.model.layers.21.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.21.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.21.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.21.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.21.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.21.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.21.self_attn.dense.weight: 0 parameters
language_model.model.layers.21.self_attn.dense.bias: 0 parameters
language_model.model.layers.21.mlp.fc1.weight: 0 parameters
language_model.model.layers.21.mlp.fc1.bias: 0 parameters
language_model.model.layers.21.mlp.fc2.weight: 0 parameters
language_model.model.layers.21.mlp.fc2.bias: 0 parameters
language_model.model.layers.21.input_layernorm.weight: 0 parameters
language_model.model.layers.21.input_layernorm.bias: 0 parameters
language_model.model.layers.22.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.22.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.22.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.22.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.22.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.22.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.22.self_attn.dense.weight: 0 parameters
language_model.model.layers.22.self_attn.dense.bias: 0 parameters
language_model.model.layers.22.mlp.fc1.weight: 0 parameters
language_model.model.layers.22.mlp.fc1.bias: 0 parameters
language_model.model.layers.22.mlp.fc2.weight: 0 parameters
language_model.model.layers.22.mlp.fc2.bias: 0 parameters
language_model.model.layers.22.input_layernorm.weight: 0 parameters
language_model.model.layers.22.input_layernorm.bias: 0 parameters
language_model.model.layers.23.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.23.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.23.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.23.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.23.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.23.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.23.self_attn.dense.weight: 0 parameters
language_model.model.layers.23.self_attn.dense.bias: 0 parameters
language_model.model.layers.23.mlp.fc1.weight: 0 parameters
language_model.model.layers.23.mlp.fc1.bias: 0 parameters
language_model.model.layers.23.mlp.fc2.weight: 0 parameters
language_model.model.layers.23.mlp.fc2.bias: 0 parameters
language_model.model.layers.23.input_layernorm.weight: 0 parameters
language_model.model.layers.23.input_layernorm.bias: 0 parameters
language_model.model.layers.24.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.24.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.24.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.24.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.24.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.24.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.24.self_attn.dense.weight: 0 parameters
language_model.model.layers.24.self_attn.dense.bias: 0 parameters
language_model.model.layers.24.mlp.fc1.weight: 0 parameters
language_model.model.layers.24.mlp.fc1.bias: 0 parameters
language_model.model.layers.24.mlp.fc2.weight: 0 parameters
language_model.model.layers.24.mlp.fc2.bias: 0 parameters
language_model.model.layers.24.input_layernorm.weight: 0 parameters
language_model.model.layers.24.input_layernorm.bias: 0 parameters
language_model.model.layers.25.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.25.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.25.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.25.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.25.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.25.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.25.self_attn.dense.weight: 0 parameters
language_model.model.layers.25.self_attn.dense.bias: 0 parameters
language_model.model.layers.25.mlp.fc1.weight: 0 parameters
language_model.model.layers.25.mlp.fc1.bias: 0 parameters
language_model.model.layers.25.mlp.fc2.weight: 0 parameters
language_model.model.layers.25.mlp.fc2.bias: 0 parameters
language_model.model.layers.25.input_layernorm.weight: 0 parameters
language_model.model.layers.25.input_layernorm.bias: 0 parameters
language_model.model.layers.26.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.26.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.26.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.26.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.26.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.26.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.26.self_attn.dense.weight: 0 parameters
language_model.model.layers.26.self_attn.dense.bias: 0 parameters
language_model.model.layers.26.mlp.fc1.weight: 0 parameters
language_model.model.layers.26.mlp.fc1.bias: 0 parameters
language_model.model.layers.26.mlp.fc2.weight: 0 parameters
language_model.model.layers.26.mlp.fc2.bias: 0 parameters
language_model.model.layers.26.input_layernorm.weight: 0 parameters
language_model.model.layers.26.input_layernorm.bias: 0 parameters
language_model.model.layers.27.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.27.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.27.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.27.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.27.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.27.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.27.self_attn.dense.weight: 0 parameters
language_model.model.layers.27.self_attn.dense.bias: 0 parameters
language_model.model.layers.27.mlp.fc1.weight: 0 parameters
language_model.model.layers.27.mlp.fc1.bias: 0 parameters
language_model.model.layers.27.mlp.fc2.weight: 0 parameters
language_model.model.layers.27.mlp.fc2.bias: 0 parameters
language_model.model.layers.27.input_layernorm.weight: 0 parameters
language_model.model.layers.27.input_layernorm.bias: 0 parameters
language_model.model.layers.28.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.28.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.28.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.28.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.28.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.28.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.28.self_attn.dense.weight: 0 parameters
language_model.model.layers.28.self_attn.dense.bias: 0 parameters
language_model.model.layers.28.mlp.fc1.weight: 0 parameters
language_model.model.layers.28.mlp.fc1.bias: 0 parameters
language_model.model.layers.28.mlp.fc2.weight: 0 parameters
language_model.model.layers.28.mlp.fc2.bias: 0 parameters
language_model.model.layers.28.input_layernorm.weight: 0 parameters
language_model.model.layers.28.input_layernorm.bias: 0 parameters
language_model.model.layers.29.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.29.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.29.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.29.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.29.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.29.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.29.self_attn.dense.weight: 0 parameters
language_model.model.layers.29.self_attn.dense.bias: 0 parameters
language_model.model.layers.29.mlp.fc1.weight: 0 parameters
language_model.model.layers.29.mlp.fc1.bias: 0 parameters
language_model.model.layers.29.mlp.fc2.weight: 0 parameters
language_model.model.layers.29.mlp.fc2.bias: 0 parameters
language_model.model.layers.29.input_layernorm.weight: 0 parameters
language_model.model.layers.29.input_layernorm.bias: 0 parameters
language_model.model.layers.30.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.30.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.30.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.30.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.30.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.30.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.30.self_attn.dense.weight: 0 parameters
language_model.model.layers.30.self_attn.dense.bias: 0 parameters
language_model.model.layers.30.mlp.fc1.weight: 0 parameters
language_model.model.layers.30.mlp.fc1.bias: 0 parameters
language_model.model.layers.30.mlp.fc2.weight: 0 parameters
language_model.model.layers.30.mlp.fc2.bias: 0 parameters
language_model.model.layers.30.input_layernorm.weight: 0 parameters
language_model.model.layers.30.input_layernorm.bias: 0 parameters
language_model.model.layers.31.self_attn.q_proj.weight: 0 parameters
language_model.model.layers.31.self_attn.q_proj.bias: 0 parameters
language_model.model.layers.31.self_attn.k_proj.weight: 0 parameters
language_model.model.layers.31.self_attn.k_proj.bias: 0 parameters
language_model.model.layers.31.self_attn.v_proj.weight: 0 parameters
language_model.model.layers.31.self_attn.v_proj.bias: 0 parameters
language_model.model.layers.31.self_attn.dense.weight: 0 parameters
language_model.model.layers.31.self_attn.dense.bias: 0 parameters
language_model.model.layers.31.mlp.fc1.weight: 0 parameters
language_model.model.layers.31.mlp.fc1.bias: 0 parameters
language_model.model.layers.31.mlp.fc2.weight: 0 parameters
language_model.model.layers.31.mlp.fc2.bias: 0 parameters
language_model.model.layers.31.input_layernorm.weight: 0 parameters
language_model.model.layers.31.input_layernorm.bias: 0 parameters
language_model.model.final_layernorm.weight: 0 parameters
language_model.model.final_layernorm.bias: 0 parameters
language_model.lm_head.weight: 0 parameters
language_model.lm_head.bias: 0 parameters
connector._connector.0.weight: 2949120 parameters
connector._connector.0.bias: 2560 parameters
connector._connector.2.weight: 6553600 parameters
connector._connector.2.bias: 2560 parameters
ZhangXJ199 commented 1 week ago

Are you using zero3? If so, this is normal. Using zero2 will display the parameters.

YenCheHsiao commented 1 week ago

Are you using zero3? If so, this is normal. Using zero2 will display the parameters.

Yes, I am using zero3, the same as in the provided file. Does using zero3 prevent training on the LLM, or is it just a display issue?

ZhangXJ199 commented 1 week ago

This should be a display issue, but we recommend using zero2.

YenCheHsiao commented 1 week ago

Using zero2 does display the LLM parameters. Thanks.