alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
773 stars 106 forks source link

RuntimeError: Error(s) in loading state_dict for ActorCriticSeparateWeights #287

Closed sAz-G closed 6 months ago

sAz-G commented 7 months ago

When I run the command python -m swarm_rl.enjoy --algo=APPO --env=quadrotor_multi --replay_buffer_sample_prob=0 --quads_use_numba=False --train_dir=/home/saz/quad-swarm-rl/train_dir --experiment=mean_embed_16_8 --quads_view_mode side --quads_render=True in quadswarms, I get the following error:

Traceback (most recent call last):
  File "/home/saz/anaconda3/envs/swarm-rl/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/saz/anaconda3/envs/swarm-rl/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/saz/quad-swarm-rl/swarm_rl/enjoy.py", line 17, in <module>
    sys.exit(main())
  File "/home/saz/quad-swarm-rl/swarm_rl/enjoy.py", line 12, in main
    status = enjoy(cfg)
  File "/home/saz/anaconda3/envs/swarm-rl/lib/python3.8/site-packages/sample_factory/enjoy.py", line 125, in enjoy
    actor_critic.load_state_dict(checkpoint_dict["model"])
  File "/home/saz/anaconda3/envs/swarm-rl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ActorCriticSeparateWeights:
        Missing key(s) in state_dict: "actor_encoder.neighbor_encoder.neighbor_mlp.0.weight", "actor_encoder.neighbor_encoder.neighbor_mlp.0.bias", "actor_encoder.neighbor_encoder.neighbor_mlp.2.weight", "actor_encoder.neighbor_encoder.neighbor_mlp.2.bias", "actor_encoder.neighbor_encoder.neighbor_mlp.4.weight", "actor_encoder.neighbor_encoder.neighbor_mlp.4.bias", "critic_encoder.neighbor_encoder.neighbor_mlp.0.weight", "critic_encoder.neighbor_encoder.neighbor_mlp.0.bias", "critic_encoder.neighbor_encoder.neighbor_mlp.2.weight", "critic_encoder.neighbor_encoder.neighbor_mlp.2.bias", "critic_encoder.neighbor_encoder.neighbor_mlp.4.weight", "critic_encoder.neighbor_encoder.neighbor_mlp.4.bias". 
        Unexpected key(s) in state_dict: "actor_encoder.neighbor_encoder.embedding_mlp.0.weight", "actor_encoder.neighbor_encoder.embedding_mlp.0.bias", "actor_encoder.neighbor_encoder.embedding_mlp.2.weight", "actor_encoder.neighbor_encoder.embedding_mlp.2.bias", "critic_encoder.neighbor_encoder.embedding_mlp.0.weight", "critic_encoder.neighbor_encoder.embedding_mlp.0.bias", "critic_encoder.neighbor_encoder.embedding_mlp.2.weight", "critic_encoder.neighbor_encoder.embedding_mlp.2.bias". 

For the training I changed the size of the hidden layers to 16 for the self encoder and 8 for the neighbor encoder, according to the model proposed in the paper Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning for mean embedding.

The command for running the training is a modified version of train_local.sh

python -m swarm_rl.train \
--env=quadrotor_multi --train_for_env_steps=1000000000 --algo=APPO --use_rnn=False \
--num_workers=4 --num_envs_per_worker=4 --learning_rate=0.0001 --ppo_clip_value=5.0 --recurrence=1 \
--nonlinearity=tanh --actor_critic_share_weights=False --policy_initialization=xavier_uniform \
--adaptive_stddev=False --with_vtrace=False --max_policy_lag=100000000 --rnn_size=16 \
--gae_lambda=1.00 --max_grad_norm=5.0 --exploration_loss_coeff=0.0 --rollout=128 --batch_size=1024 \
--with_pbt=False --normalize_input=False --normalize_returns=False --reward_clip=10 \
--quads_use_numba=True --save_milestones_sec=3600 --anneal_collision_steps=300000000 \
--replay_buffer_sample_prob=0.75 \
--quads_mode=mix --quads_episode_duration=15.0 \
--quads_obs_repr=xyz_vxyz_R_omega \
--quads_neighbor_hidden_size=8 --quads_neighbor_obs_type=pos_vel --quads_collision_hitbox_radius=2.0 \
--quads_collision_falloff_radius=4.0 --quads_collision_reward=5.0 --quads_collision_smooth_max_penalty=10.0 \
--quads_neighbor_encoder_type=mean_embed --quads_neighbor_visible_num=6 \
--quads_use_obstacles=False --quads_use_downwash=True \
--experiment=mean_embed_16_8

How can I avoid the error?

alex-petrenko commented 7 months ago

So it looks like the enjoy code expects the keys to be weight tensors to be named actor_encoder.neighbor_encoder.neighbor_mlp.0.weight (notice the neighbor_mlp) part, and the model that you're trying to evaluate has a name actor_encoder.neighbor_encoder.neighbor_mlp.0.weight there.

Could it be that the model was produced by a different version of the code? Or some variables got renamed?

Also, I think this question is better asked on https://github.com/Zhehui-Huang/quad-swarm-rl/ or we can also try to summon @Zhehui-Huang here :)

Zhehui-Huang commented 7 months ago

This issue means you load the wrong model. You can use the debug mode and see if you load the model in the correct path. Besides, please make sure the code you used for training is the same as using enjoy script.

sAz-G commented 6 months ago

I could not find out why this error happened. Maybe I changed the configuration after the training started by mistake (I was running multiple training sessions).

Anyway I ran multiple training sessions afterwards and I could load the different models in the simulation

alex-petrenko commented 6 months ago

Thank you for the update!