Closed LemonWade closed 1 year ago
Hi @LemonWade ,
Happy to help. I don't think you did anything wrong but I am unable to reproduce the issue at my end. The code looks fine, so my hunch would be that it is something to do with the data.
I did the following steps:
Downloaded train/slide_block_to_color_target.zip
for here. Placed it under folder rvt/data/train
. Used the command unzip slide_block_to_color_target.zip
Downaloded replay_train/slide_block_to_color_target.tar.xz
from here. Placed it under rvt/replay/replay_train
. Used the command tar -xf slide_block_to_color_target.tar.xz
Here is my folder structure:
tree -L 3 130 ↵
.
├── config.py
├── configs
│ ├── all.yaml
│ └── peract_official_config.yaml
├── data
│ └── train
│ ├── slide_block_to_color_target
│ └── slide_block_to_color_target.zip
├── eval_internal.py
├── eval.py
├── libs
│ ├── peract
│ │ ├── agents
│ │ ├── ARM_LICENSE
│ │ ├── conf
│ │ ├── eval.py
│ │ ├── helpers
│ │ ├── LICENSE
│ │ ├── media
│ │ ├── model-card.md
│ │ ├── README.md
│ │ ├── requirements.txt
│ │ ├── run_seed_fn.py
│ │ ├── scripts
│ │ ├── setup.py
│ │ ├── train.py
│ │ └── voxel
│ ├── peract_colab
│ │ ├── peract_colab
│ │ ├── peract_colab.egg-info
│ │ └── setup.py
│ ├── PyRep
│ │ ├── build
│ │ ├── cffi_build
│ │ ├── docs
│ │ ├── examples
│ │ ├── LICENSE
│ │ ├── pyrep
│ │ ├── PyRep.egg-info
│ │ ├── README.md
│ │ ├── requirements.txt
│ │ ├── robot_ttms
│ │ ├── setup.py
│ │ ├── system
│ │ ├── tests
│ │ ├── tools
│ │ └── tutorials
│ ├── RLBench
│ │ ├── examples
│ │ ├── LICENSE
│ │ ├── readme_files
│ │ ├── README.md
│ │ ├── requirements.txt
│ │ ├── rlbench
│ │ ├── rlbench.egg-info
│ │ ├── setup.py
│ │ ├── tests
│ │ ├── tools
│ │ ├── travisci_generate_index.py
│ │ ├── travisci_run_tests.py
│ │ └── tutorials
│ └── YARR
│ ├── LICENSE
│ ├── logo.png
│ ├── README.md
│ ├── requirements.txt
│ ├── setup.py
│ ├── yarr
│ └── yarr.egg-info
├── models
│ ├── peract_official.py
│ ├── __pycache__
│ │ ├── peract_official.cpython-38.pyc
│ │ └── rvt_agent.cpython-38.pyc
│ └── rvt_agent.py
├── mvt
│ ├── attn.py
│ ├── augmentation.py
│ ├── aug_utils.py
│ ├── config.py
│ ├── __init__.py
│ ├── mvt.py
│ ├── mvt_single.py
│ ├── __pycache__
│ │ ├── attn.cpython-38.pyc
│ │ ├── augmentation.cpython-38.pyc
│ │ ├── aug_utils.cpython-38.pyc
│ │ ├── config.cpython-38.pyc
│ │ ├── __init__.cpython-38.pyc
│ │ ├── mvt.cpython-38.pyc
│ │ ├── mvt_single.cpython-38.pyc
│ │ ├── renderer.cpython-38.pyc
│ │ └── utils.cpython-38.pyc
│ ├── renderer.py
│ └── utils.py
├── __pycache__
│ └── config.cpython-38.pyc
├── replay
│ └── replay_train
│ ├── slide_block_to_color_target
│ └── slide_block_to_color_target.tar.xz
├── runs
│ └── rvt_tasks_slide_block_to_color_target
│ ├── args.yaml
│ ├── events.out.tfevents.1691690765.neil
│ ├── events.out.tfevents.1691690946.neil
│ ├── exp_cfg.yaml
│ └── mvt_cfg.yaml
├── train.py
└── utils
├── custom_rlbench_env.py
├── dataset.py
├── ddp_utils.py
├── get_dataset.py
├── __init__.py
├── lr_sched_utils.py
├── peract_utils.py
├── __pycache__
│ ├── custom_rlbench_env.cpython-38.pyc
│ ├── dataset.cpython-38.pyc
│ ├── ddp_utils.cpython-38.pyc
│ ├── get_dataset.cpython-38.pyc
│ ├── __init__.cpython-38.pyc
│ ├── lr_sched_utils.cpython-38.pyc
│ ├── peract_utils.cpython-38.pyc
│ ├── rlbench_planning.cpython-38.pyc
│ └── rvt_utils.cpython-38.pyc
├── rlbench_planning.py
└── rvt_utils.py
Here is the log:
╰─$ python3 train.py --exp_cfg_path configs/all.yaml --device 0 --exp_cfg_opts "tasks slide_block_to_color_target"
dict(exp_cfg)={'agent': 'our', 'tasks': 'slide_block_to_color_target', 'exp_id': 'rvt_tasks_slide_block_to_color_target', 'resume': '', 'bs': 3, 'epochs': 15, 'num_workers': 3, 'sample_distribution_mode': 'task_uniform', 'peract': CfgNode({'lambda_weight_l2': 1e-06, 'lr': 0.00030000000000000003, 'optimizer_type': 'lamb', 'warmup_steps': 2000, 'lr_cos_dec': True, 'add_rgc_loss': True, 'num_rotation_classes': 72, 'transform_augmentation': True, 'transform_augmentation_xyz': [0.125, 0.125, 0.125], 'transform_augmentation_rpy': [0.0, 0.0, 45.0]}), 'rvt': CfgNode({'gt_hm_sigma': 1.5, 'img_aug': 0.1, 'place_with_mean': False, 'move_pc_in_bound': True}), 'peract_official': CfgNode({'cfg_path': 'configs/peract_official_config.yaml'})}
Training on 1 tasks: ['slide_block_to_color_target']
[Info] Replay dataset already exists in the disk: replay/replay_train/slide_block_to_color_target
Created Dataset. Time Cost: 0.08458356459935507 minutes
MVT Vars: {'training': True, '_parameters': OrderedDict(), '_buffers': OrderedDict(), '_non_persistent_buffers_set': set(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'depth': 8, 'img_feat_dim': 3, 'img_size': 220, 'add_proprio': True, 'proprio_dim': 4, 'add_lang': True, 'lang_dim': 512, 'lang_len': 77, 'im_channels': 64, 'img_patch_size': 11, 'final_dim': 64, 'attn_dropout': 0.1, 'decoder_dropout': 0.0, 'self_cross_ver': 1, 'add_corr': True, 'add_pixel_loc': True, 'add_depth': True, 'pe_fix': True}
Start training ...
Rank [0], Epoch [0]: Training on train dataset
0%| | 0/53333 [00:00<?, ?it/s]/home/angoyal/RVT/rvt/models/rvt_agent.py:518: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
trans_aug_range=torch.tensor(self._transform_augmentation_xyz),
0%|▏ | 76/53333 [00:41<7:34:39, 1.95it/s]
After following your steps to re-download the data, I successfully trained the model. The previous error was likely due to file corruption during my download process. I'm very grateful to you for reproducing the process for me. Thank you again, and the evaluation also ran normally. Thank you.
Thank you very much for your work. Below is a bug I encountered while reproducing
I downloaded and decompressed the data and replay for a single task. Later, due to the deprecation of
np.bool
in numpy, I replaced all instances ofnp.bool
withnp.bool_
. When I executed the training code againpython train.py --exp_cfg_path configs/all_100.yaml --device 0
, I encountered aKeyError: 'lang_goal_tokens'
. Did I do something wrong?configs/all_100.yaml
logs