When i try to load the gpt-neo-125M using train/trainer.py, following log shows up . I wonder is this ok ? I have change the Re_gptForCausalLM to GPTNeoForCausalLM, it disappears. Some weights of Re_gptForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-125M and are newly initialized: ['transformer.h.5.cross_attn.fn.cross_attn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_k.weight', 'transformer.encoder.layers.0.1.fn.to_out.weight', 'transformer.encoder.layers.0.0.fn.to_v.weight', 'transformer.encoder.layers.0.1.fn.to_v.weight', 'transformer.encoder.layers.0.0.fn.to_k.weight', 'transformer.encoder.layers.1.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.0.weight', 'transformer.encoder.layers.0.0.fn.to_out.weight', 'transformer.rotary_pos_emb.inv_freq', 'transformer.h.5.cross_attn.fn.cross_attn.null_v', 'transformer.encoder.layers.1.1.fn.to_q.weight', 'transformer.encoder.layers.1.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.norm.weight', 'transformer.encoder.layers.1.1.fn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_v.weight', 'transformer.encoder.layers.1.2.fn.ff.3.bias', 'transformer.encoder.layers.0.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.fn.ff.0.weight', 'transformer.encoder.norm_out.weight', 'transformer.encoder.project_out.bias', 'transformer.encoder.layers.0.1.fn.to_q.weight', 'transformer.encoder.layers.0.2.norm.weight', 'transformer.encoder.layers.0.1.norm.weight', 'transformer.encoder.rotary_pos_emb.inv_freq', 'transformer.encoder.layers.1.1.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.null_k', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.bias', 'transformer.encoder.layers.1.2.fn.ff.3.weight', 'transformer.encoder.layers.1.1.norm.weight', 'transformer.encoder.layers.0.2.fn.ff.3.bias', 'transformer.h.5.cross_attn.norm.weight', 'transformer.encoder.layers.1.2.fn.ff.0.bias', 'transformer.encoder.layers.0.2.fn.ff.0.bias', 'transformer.encoder.layers.0.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.3.weight', 'transformer.encoder.layers.0.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_out.bias', 'transformer.encoder.project_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_k.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.weight', 'transformer.encoder.layers.1.0.norm.weight', 'transformer.encoder.layers.0.0.norm.weight', 'transformer.encoder.layers.0.0.fn.to_out.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
When i try to load the gpt-neo-125M using train/trainer.py, following log shows up . I wonder is this ok ? I have change the Re_gptForCausalLM to GPTNeoForCausalLM, it disappears.
Some weights of Re_gptForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-125M and are newly initialized: ['transformer.h.5.cross_attn.fn.cross_attn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_k.weight', 'transformer.encoder.layers.0.1.fn.to_out.weight', 'transformer.encoder.layers.0.0.fn.to_v.weight', 'transformer.encoder.layers.0.1.fn.to_v.weight', 'transformer.encoder.layers.0.0.fn.to_k.weight', 'transformer.encoder.layers.1.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.0.weight', 'transformer.encoder.layers.0.0.fn.to_out.weight', 'transformer.rotary_pos_emb.inv_freq', 'transformer.h.5.cross_attn.fn.cross_attn.null_v', 'transformer.encoder.layers.1.1.fn.to_q.weight', 'transformer.encoder.layers.1.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.norm.weight', 'transformer.encoder.layers.1.1.fn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_v.weight', 'transformer.encoder.layers.1.2.fn.ff.3.bias', 'transformer.encoder.layers.0.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.fn.ff.0.weight', 'transformer.encoder.norm_out.weight', 'transformer.encoder.project_out.bias', 'transformer.encoder.layers.0.1.fn.to_q.weight', 'transformer.encoder.layers.0.2.norm.weight', 'transformer.encoder.layers.0.1.norm.weight', 'transformer.encoder.rotary_pos_emb.inv_freq', 'transformer.encoder.layers.1.1.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.null_k', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.bias', 'transformer.encoder.layers.1.2.fn.ff.3.weight', 'transformer.encoder.layers.1.1.norm.weight', 'transformer.encoder.layers.0.2.fn.ff.3.bias', 'transformer.h.5.cross_attn.norm.weight', 'transformer.encoder.layers.1.2.fn.ff.0.bias', 'transformer.encoder.layers.0.2.fn.ff.0.bias', 'transformer.encoder.layers.0.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.3.weight', 'transformer.encoder.layers.0.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_out.bias', 'transformer.encoder.project_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_k.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.weight', 'transformer.encoder.layers.1.0.norm.weight', 'transformer.encoder.layers.0.0.norm.weight', 'transformer.encoder.layers.0.0.fn.to_out.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.