Mis-matched TD keys causing RT Error when training with 'collect_with_grad':True

This error occurs when running on a NVIDIA Tesla P100. I have also tested this on Apple M3, where the error is not thrown..

Experiment Config: `Algorithm: maddpg, Task: vmas/navigation

Loaded config:

experiment: sampling_device: cuda train_device: cuda buffer_device: cuda share_policy_params: false prefer_continuous_actions: true collect_with_grad: true gamma: 0.9 lr: 0.0005 adam_eps: 1.0e-06 clip_grad_norm: true clip_grad_val: 5.0 soft_target_update: true polyak_tau: 0.005 hard_target_update_frequency: 5 exploration_eps_init: 0.8 exploration_eps_end: 0.01 exploration_anneal_frames: null max_n_iters: 1000 max_n_frames: null on_policy_collected_frames_per_batch: 6000 on_policy_n_envs_per_worker: 10 on_policy_n_minibatch_iters: 45 on_policy_minibatch_size: 400 off_policy_collected_frames_per_batch: 6000 off_policy_n_envs_per_worker: 10 off_policy_n_optimizer_steps: 1000 off_policy_train_batch_size: 128 off_policy_memory_size: 1000000 off_policy_init_random_frames: 0 evaluation: true render: false evaluation_interval: 60000 evaluation_episodes: 100 evaluation_deterministic_actions: true loggers: [] create_json: true save_folder: null restore_file: null checkpoint_interval: 600000 checkpoint_at_end: false keep_checkpoints_num: 3 algorithm: share_param_critic: true loss_function: l2 delay_value: true use_tanh_mapping: true task: max_steps: 100 n_agents: 3 collisions: true agents_with_same_goal: 1 observe_all_goals: false shared_rew: false split_goals: false lidar_range: 0.35 agent_radius: 0.1 model: name: mlp num_cells:

256
256 layer_class: torch.nn.Linear activation_class: torch.nn.Tanh activation_kwargs: null norm_class: null norm_kwargs: null critic_model: name: mlp num_cells:
256
256 layer_class: torch.nn.Linear activation_class: torch.nn.Tanh activation_kwargs: null norm_class: null norm_kwargs: null seed: 0`

Full Hydra Stack Trace: `Traceback (most recent call last): File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 425, in _stack keys = _check_keys(list_of_tensordicts, strict=True) File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/utils.py", line 1523, in _check_keys raise KeyError( KeyError: "got keys {'action', 'episode_reward', 'info', 'observation', 'param', 'reward'} and {'action', 'episode_reward', 'info', 'observation', 'param'} which are incompatible"

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/run.py", line 42, in hydra_experiment() File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/internal/hydra.py", line 132, in run = ret.return_value File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/run.py", line 38, in hydra_experiment experiment.run() File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 553, in run raise err File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 545, in run self._collection_loop() File "/mnt/storage/scratch/vd20433/BenchMARL/benchmarl/experiment/experiment.py", line 575, in _collection_loop batch = self.rollout_env.rollout( File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/torchrl/envs/common.py", line 2567, in rollout out_td = torch.stack(tensordicts, len(batch_size), out=out) File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/base.py", line 388, in __torch_function__ return TD_HANDLED_FUNCTIONS[func](*args, **kwargs) File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 496, in _stack out = { File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 497, in key: stack_fn(key, values, is_not_init, is_tensor) File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 494, in stack_fn return _stack(values, dim, maybe_dense_stack=maybe_dense_stack) File "/mnt/storage/scratch/vd20433/miniconda3/envs/benchmarl/lib/python3.10/site-packages/tensordict/_torch_func.py", line 432, in _stack raise RuntimeError( RuntimeError: The sets of keys in the tensordicts to stack are exclusive. Consider using LazyStackedTensorDict.maybe_dense_stack instead.`

facebookresearch / BenchMARL

Mis-matched TD keys causing RT Error when training with 'collect_with_grad':True #112