StoneT2000 / trajectorytranslation

Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)
https://trajectorytranslation.github.io/
21 stars 2 forks source link

Some Hardware issues #3

Open siaoliu opened 11 months ago

siaoliu commented 11 months ago

Thanks for such excellent work by the authors!!! Due to limited computational resources on my platform, I would like to inquire about the hardware requirements for reproducing this work. Is an Nvidia RTX-3090 sufficient for the task, and could you provide some information on the duration of each training session in your work?

StoneT2000 commented 11 months ago

I use a 3080 or 2080 for all my experiments. Training time is about 1 day at max. (block stacking finetuning stage is another extra day but it converges far earlier)

It trains faster if you have more CPUs and can increase the number of training envs

siaoliu commented 11 months ago

Thanks for your reply! I have attempted this on my device. However, I encountered some issues.

  1. Directly run 'conda env create -f environment.yml' cannot help me, so I install all dependencies respectively. I also meet some issues.
    env = SubprocVecEnv([make_env(i) for i in range(exp_cfg.n_envs)])

    get errors

    ConnectionResetError: [Errno 104] Connection reset by peer

I will reclone this repo in my laptop, really hope get your help~ Your work has opened up entirely new perspectives on generalization in robotics for my perspectives, and this line of research deserves greater attention from more researchers.

it would be appreciated if the author could provide more detailed installation instructions. I tried using the check_env function in stable-baselines3 but encountered issues creating the environment, likely due to version incompatibility between the various installed packages.

siaoliu commented 11 months ago

Thanks again for authors such an excellent work, I have solved the training issues and write a install.md as follows. When using a server without screen, there are some errors. Maybe the env does not support the headless mode?

Install torch

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

upgrade the setuptool & wheel for installing gym < 0.22

pip install --upgrade pip setuptools==57.5.0 wheel==0.38.3

install Mani-skill2 to install some dependencies, this package will remove later, but can install some package in one command.

pip install mani-skill2

install other denpendency

pip install stable-baselines3==1.8 transformers wandb omegaconf pyglet open3d tensorboard moviepy

install all envs

pip install minigrid dm_control x-magical

local install for most tasks

pip install -e ./paper_rl/
pip install -e . 
pip install -e external/ManiSkill2

If want to try the Opendrawer task, we need to check the sapien version to v1:

conda create -n tr2_open --clone tr2
conda activate tr2_open
pip uninstall mani-skill2
pip install sapien==1.1
pip install external/ManiSkill
StoneT2000 commented 10 months ago

Sorry for the very delayed reply, It'll be a little hard to debug with just ConnectionResetError: [Errno 104] Connection reset by peer. This bug happens usually if one of the parallel envs has an error. Could you show the full error log?

This was a fairly "old" project so I may need some time to go through my old environment and see if I can pull out a more reproducible environment.yml file for use with conda.