DigiRL-agent / digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
Apache License 2.0
198 stars 16 forks source link

Error in loading final checkpoints #11

Closed mousewu closed 1 month ago

mousewu commented 1 month ago

I want to reproduce evalution only results using final checkpoints you provided. I replaced policy_lm section with the checkpoint path in the default.yaml.

The following error occurs when loading model:

RuntimeError: Error(s) in loading state_dict for T5ForMultimodalGeneration: size mismatch for shared.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 512]). size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 12]) from checkpoint, the shape in current model is torch.Size([32, 8]). size mismatch for encoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for encoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for encoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.final_layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 12]) from checkpoint, the shape in current model is torch.Size([32, 8]). size mismatch for decoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.0.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.0.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.1.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.1.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.2.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.2.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.3.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.3.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.4.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.4.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.5.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for decoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for decoder.block.5.layer.2.layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for decoder.final_layer_norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32128, 768]) from checkpoint, the shape in current model is torch.Size([32128, 512]). size mismatch for image_dense.weight: copying a param with shape torch.Size([768, 1408]) from checkpoint, the shape in current model is torch.Size([512, 1408]). size mismatch for image_dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for mha_layer.in_proj_weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for mha_layer.in_proj_bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for mha_layer.out_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for mha_layer.out_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for gate_dense.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for gate_dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

mousewu commented 1 month ago

I guess it is because there are some cofigurations missing in config.json file

BiEchi commented 1 month ago

Thanks for your interest in our work. Would you like to share a minimum working example? It should include:

  1. The command you're using.
  2. pip show digirl

Also, please remember to use the command cd scripts && python run.py --config-path config/main --config-name digirl_online - note that digirl_online.yaml is cascading from default.yaml.

mousewu commented 1 month ago

I use Windows

(digirl) E:\digirl\scripts>python run.py --config-path config\main --config-name eval_only task_set: general task_split: test eval_sample_mode: sequential max_steps: 10 huggingface_token: hf_pLDobMHpWvhaXPgOKrOPH wandb_key: '' gemini_key: '' policy_lm: F:\digirl\ckpts\general-off2on-digirl\trainer.pt critic_lm: roberta-base capacity: 2000 epochs: 5 batch_size: 4 bsize: 8 rollout_size: 16 grad_accum_steps: 32 warmup_iter: 0 actor_epochs: 20 trajectory_critic_epochs: 5 lm_lr: 0.0001 critic_lr: 0.0001 max_grad_norm: 0.01 gamma: 0.5 use_lora: false agent_name: autoui do_sample: true temperature: 1.0 tau: 0.01 max_new_tokens: 128 record: false use_wandb: false entity_name: '' project_name: '' android_avd_home: D:\Users\8028\.android\avd\test_Android.avd emulator_path: E:\Android\sdk\emulator adb_path: E:\Android\sdk\platform-tools cache_dir: D:\Users\8028\.cache assets_path: E:\digirl\digirl\environment\android\assets\task_set train_algorithm: digirl save_path: F:\digirl\ckpts\general-off2on-digirl\trainer.pt\ task_mode: evaluate parallel: single eval_iterations: 6 save_freq: 3

The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well. Token is valid (permission: fineGrained). Your token has been saved to D:\Users\8028.cache\huggingface\token Login successful

Agent: autoui Evauation mode Error executing job with overrides: [] Traceback (most recent call last): File "E:\digirl\scripts\run.py", line 67, in main agent = AutoUIAgent(device=device, accelerator=accelerator, File "e:\digirl\digirl\models\autoui_agent.py", line 28, in init self.model = T5ForMultimodalGeneration.from_pretrained(policy_lm, cache_dir=cache_dir).to(device) File "E:\Anaconda3\envs\digirl\lib\site-packages\transformers\modeling_utils.py", line 3850, in from_pretrained ) = cls._load_pretrained_model( File "E:\Anaconda3\envs\digirl\lib\site-packages\transformers\modeling_utils.py", line 4335, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for T5ForMultimodalGeneration: size mismatch for encoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 12]) from checkpoint, the shape in current model is torch.Size([32, 8]). size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for encoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for encoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for encoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for encoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for encoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 12]) from checkpoint, the shape in current model is torch.Size([32, 8]). size mismatch for decoder.block.0.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.0.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.1.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.1.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.2.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.2.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.3.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.3.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.4.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.4.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for decoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.5.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for decoder.block.5.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([768, 768]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

(digirl) E:>pip show digirl Name: digirl Version: 0.1.0 Summary: Research code for digirl Home-page: https://github.com/DigiRL-agent/digirl Author: Hao Bai Author-email: License: MIT Location: e:\digirl Editable project location: e:\digirl Requires: accelerate, annotated-types, appium-python-client, beautifulsoup4, blis, brotlipy, catalogue, certifi, cffi, charset-normalizer, click, cloudpathlib, cloudpickle, confection, contourpy, cryptography, cycler, cymem, Farama-Notifications, fonttools, google-generativeai, gradio, gym, gym-notices, gymnasium, hashids, hydra-core, Jinja2, jupyter, kiwisolver, langcodes, MarkupSafe, matplotlib, mementos, memory_profiler, more-itertools, murmurhash, networkx, numpy, openai, packaging, peft, Pillow, pluggy, preshed, prompt-toolkit, pycosat, pycparser, pydantic, pydantic_core, pyinstrument, pyOpenSSL, pyparsing, PySocks, python-dateutil, requests, ruamel.yaml, ruamel.yaml.clib, sentencepiece, six, smart-open, spacy, spacy-legacy, spacy-loggers, srsly, TatSu, tenacity, termcolor, thinc, toolz, torch, tqdm, transformers, typer, typing_extensions, urllib3, wandb, wasabi, wcwidth, weasel, zstandard Required-by:

BiEchi commented 1 month ago

It works on my end. Did you make sure the checkpoint you're loading is the same as the train_algorithm in config? That is, if you're using the filtereBC checkpoint, you should use "filteredbc" in eval_only.yaml; if you use the digirl checkpoint, you should use "digirl" in eval_only.yaml.

mousewu commented 1 month ago

adding the following information in config.json sloved my problem: { "_name_or_path": "jackbai/general_off2on_digirl", "architectures": [ "T5ForMultimodalGeneration" ], "model_type": "t5", "d_model": 768, "num_heads": 12 }