KeyError: 'lang_goal_tokens'

Thank you very much for your work. Below is a bug I encountered while reproducing

GZTNR~H%1T~ZSM}7(9S}D3V

I downloaded and decompressed the data and replay for a single task. Later, due to the deprecation of np.bool in numpy, I replaced all instances of np.bool with np.bool_. When I executed the training code again python train.py --exp_cfg_path configs/all_100.yaml --device 0, I encountered a KeyError: 'lang_goal_tokens'. Did I do something wrong?

configs/all_100.yaml

exp_id: rvt
tasks: slide_block_to_color_target
bs: 3
num_workers: 3
epochs: 15
sample_distribution_mode: task_uniform
peract:
  lr: 1e-4
  warmup_steps: 2000
  optimizer_type: lamb
  lr_cos_dec: True
  transform_augmentation_xyz: [0.125, 0.125, 0.125]
  transform_augmentation_rpy: [0.0, 0.0, 45.0]
rvt:
  place_with_mean: False

logs

(rvt-zzy) root@7708b7cca4e2:/data/zzy/RVT/rvt# python train.py --exp_cfg_path configs/all_100.yaml --device 0              
dict(exp_cfg)={'agent': 'our', 'tasks': 'slide_block_to_color_target', 'exp_id': 'rvt', 'resume': '', 'bs': 3, 'epochs': 15, 'num_workers': 3, 'sample_distribution_mode': 'task_uniform', 'peract': CfgNode({'lambda_weight_l2': 1e-06, 'lr': 0.00030000000000000003, 'optimizer_type': 'lamb', 'warmup_steps': 2000, 'lr_cos_dec': True, 'add_rgc_loss': True, 'num_rotation_classes': 72, 'transform_augmentation': True, 'transform_augmentation_xyz': [0.125, 0.125, 0.125], 'transform_augmentation_rpy': [0.0, 0.0, 45.0]}), 'rvt': CfgNode({'gt_hm_sigma': 1.5, 'img_aug': 0.1, 'place_with_mean': False, 'move_pc_in_bound': True}), 'peract_official': CfgNode({'cfg_path': 'configs/peract_official_config.yaml'})}
Training on 1 tasks: ['slide_block_to_color_target']
[Info] Replay dataset already exists in the disk: replay/replay_train/slide_block_to_color_target
Created Dataset. Time Cost: 0.21861758629480998 minutes
MVT Vars: {'training': True, '_parameters': OrderedDict(), '_buffers': OrderedDict(), '_non_persistent_buffers_set': set(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'depth': 8, 'img_feat_dim': 3, 'img_size': 220, 'add_proprio': True, 'proprio_dim': 4, 'add_lang': True, 'lang_dim': 512, 'lang_len': 77, 'im_channels': 64, 'img_patch_size': 11, 'final_dim': 64, 'attn_dropout': 0.1, 'decoder_dropout': 0.0, 'self_cross_ver': 1, 'add_corr': True, 'add_pixel_loc': True, 'add_depth': True, 'pe_fix': True}
Start training ...
Rank [0], Epoch [0]: Training on train dataset
  0%|                                                                                                                                                         | 0/53333 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 300, in <module>
    mp.spawn(experiment, args=(cmd_args, devices, port), nprocs=len(devices), join=True)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/data/zzy/RVT/rvt/train.py", line 260, in experiment
    out = train(agent, train_dataset, TRAINING_ITERATIONS, rank)
  File "/data/zzy/RVT/rvt/train.py", line 54, in train
    raw_batch = next(data_iter)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/anaconda3/envs/rvt-zzy/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 39, in fetch
    data = next(self.dataset_iter)
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 41, in _generator
    yield self._replay_buffer.sample_transition_batch(pack_in_dict=True, distribution_mode = self._sample_distribution_mode)
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 803, in sample_transition_batch
    store = self._get_from_disk(
  File "/data/zzy/RVT/rvt/libs/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 456, in _get_from_disk
    store[k][i] = v # NOTE: potential bug here, should % self._replay_capacity
KeyError: 'lang_goal_tokens'

Hi @LemonWade ,

Happy to help. I don't think you did anything wrong but I am unable to reproduce the issue at my end. The code looks fine, so my hunch would be that it is something to do with the data.

I did the following steps:

Downloaded train/slide_block_to_color_target.zip for here. Placed it under folder rvt/data/train. Used the command unzip slide_block_to_color_target.zip
Downaloded replay_train/slide_block_to_color_target.tar.xz from here. Placed it under rvt/replay/replay_train. Used the command tar -xf slide_block_to_color_target.tar.xz

Here is my folder structure:

tree -L 3                                                                                                                                                                                               130 ↵
.
├── config.py
├── configs
│   ├── all.yaml
│   └── peract_official_config.yaml
├── data
│   └── train
│       ├── slide_block_to_color_target
│       └── slide_block_to_color_target.zip
├── eval_internal.py
├── eval.py
├── libs
│   ├── peract
│   │   ├── agents
│   │   ├── ARM_LICENSE
│   │   ├── conf
│   │   ├── eval.py
│   │   ├── helpers
│   │   ├── LICENSE
│   │   ├── media
│   │   ├── model-card.md
│   │   ├── README.md
│   │   ├── requirements.txt
│   │   ├── run_seed_fn.py
│   │   ├── scripts
│   │   ├── setup.py
│   │   ├── train.py
│   │   └── voxel
│   ├── peract_colab
│   │   ├── peract_colab
│   │   ├── peract_colab.egg-info
│   │   └── setup.py
│   ├── PyRep
│   │   ├── build
│   │   ├── cffi_build
│   │   ├── docs
│   │   ├── examples
│   │   ├── LICENSE
│   │   ├── pyrep
│   │   ├── PyRep.egg-info
│   │   ├── README.md
│   │   ├── requirements.txt
│   │   ├── robot_ttms
│   │   ├── setup.py
│   │   ├── system
│   │   ├── tests
│   │   ├── tools
│   │   └── tutorials
│   ├── RLBench
│   │   ├── examples
│   │   ├── LICENSE
│   │   ├── readme_files
│   │   ├── README.md
│   │   ├── requirements.txt
│   │   ├── rlbench
│   │   ├── rlbench.egg-info
│   │   ├── setup.py
│   │   ├── tests
│   │   ├── tools
│   │   ├── travisci_generate_index.py
│   │   ├── travisci_run_tests.py
│   │   └── tutorials
│   └── YARR
│       ├── LICENSE
│       ├── logo.png
│       ├── README.md
│       ├── requirements.txt
│       ├── setup.py
│       ├── yarr
│       └── yarr.egg-info
├── models
│   ├── peract_official.py
│   ├── __pycache__
│   │   ├── peract_official.cpython-38.pyc
│   │   └── rvt_agent.cpython-38.pyc
│   └── rvt_agent.py
├── mvt
│   ├── attn.py
│   ├── augmentation.py
│   ├── aug_utils.py
│   ├── config.py
│   ├── __init__.py
│   ├── mvt.py
│   ├── mvt_single.py
│   ├── __pycache__
│   │   ├── attn.cpython-38.pyc
│   │   ├── augmentation.cpython-38.pyc
│   │   ├── aug_utils.cpython-38.pyc
│   │   ├── config.cpython-38.pyc
│   │   ├── __init__.cpython-38.pyc
│   │   ├── mvt.cpython-38.pyc
│   │   ├── mvt_single.cpython-38.pyc
│   │   ├── renderer.cpython-38.pyc
│   │   └── utils.cpython-38.pyc
│   ├── renderer.py
│   └── utils.py
├── __pycache__
│   └── config.cpython-38.pyc
├── replay
│   └── replay_train
│       ├── slide_block_to_color_target
│       └── slide_block_to_color_target.tar.xz
├── runs
│   └── rvt_tasks_slide_block_to_color_target
│       ├── args.yaml
│       ├── events.out.tfevents.1691690765.neil
│       ├── events.out.tfevents.1691690946.neil
│       ├── exp_cfg.yaml
│       └── mvt_cfg.yaml
├── train.py
└── utils
    ├── custom_rlbench_env.py
    ├── dataset.py
    ├── ddp_utils.py
    ├── get_dataset.py
    ├── __init__.py
    ├── lr_sched_utils.py
    ├── peract_utils.py
    ├── __pycache__
    │   ├── custom_rlbench_env.cpython-38.pyc
    │   ├── dataset.cpython-38.pyc
    │   ├── ddp_utils.cpython-38.pyc
    │   ├── get_dataset.cpython-38.pyc
    │   ├── __init__.cpython-38.pyc
    │   ├── lr_sched_utils.cpython-38.pyc
    │   ├── peract_utils.cpython-38.pyc
    │   ├── rlbench_planning.cpython-38.pyc
    │   └── rvt_utils.cpython-38.pyc
    ├── rlbench_planning.py
    └── rvt_utils.py

Here is the log:

╰─$ python3 train.py --exp_cfg_path configs/all.yaml --device 0 --exp_cfg_opts "tasks slide_block_to_color_target"                                                                                      
dict(exp_cfg)={'agent': 'our', 'tasks': 'slide_block_to_color_target', 'exp_id': 'rvt_tasks_slide_block_to_color_target', 'resume': '', 'bs': 3, 'epochs': 15, 'num_workers': 3, 'sample_distribution_mode': 'task_uniform', 'peract': CfgNode({'lambda_weight_l2': 1e-06, 'lr': 0.00030000000000000003, 'optimizer_type': 'lamb', 'warmup_steps': 2000, 'lr_cos_dec': True, 'add_rgc_loss': True, 'num_rotation_classes': 72, 'transform_augmentation': True, 'transform_augmentation_xyz': [0.125, 0.125, 0.125], 'transform_augmentation_rpy': [0.0, 0.0, 45.0]}), 'rvt': CfgNode({'gt_hm_sigma': 1.5, 'img_aug': 0.1, 'place_with_mean': False, 'move_pc_in_bound': True}), 'peract_official': CfgNode({'cfg_path': 'configs/peract_official_config.yaml'})}
Training on 1 tasks: ['slide_block_to_color_target']
[Info] Replay dataset already exists in the disk: replay/replay_train/slide_block_to_color_target
Created Dataset. Time Cost: 0.08458356459935507 minutes
MVT Vars: {'training': True, '_parameters': OrderedDict(), '_buffers': OrderedDict(), '_non_persistent_buffers_set': set(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'depth': 8, 'img_feat_dim': 3, 'img_size': 220, 'add_proprio': True, 'proprio_dim': 4, 'add_lang': True, 'lang_dim': 512, 'lang_len': 77, 'im_channels': 64, 'img_patch_size': 11, 'final_dim': 64, 'attn_dropout': 0.1, 'decoder_dropout': 0.0, 'self_cross_ver': 1, 'add_corr': True, 'add_pixel_loc': True, 'add_depth': True, 'pe_fix': True}
Start training ...
Rank [0], Epoch [0]: Training on train dataset
  0%|                                                                                                                                                                                   | 0/53333 [00:00<?, ?it/s]/home/angoyal/RVT/rvt/models/rvt_agent.py:518: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  trans_aug_range=torch.tensor(self._transform_augmentation_xyz),
  0%|▏                                                                                                                                                                       | 76/53333 [00:41<7:34:39,  1.95it/s]

NVlabs / RVT

KeyError: 'lang_goal_tokens' #11