About Training and Testing

iltertaha commented 4 years ago

Hi, First, thank you for sharing this great project.

I have some questions related to training and testing phases. As stated in overview.md I have used the command below to train:

python3 -m rl.main --env FurnitureBaxterBlockEnv --prefix demo --reward_scale 3 --wandb True

Then I've noticed that this doesn't work with unity and giving:

[Errno 111] Connection refused now connecting to 1050

message all the time.

So I've added --unity False to work.( I am not sure, this is the correct way. Is it possible to see simulator while training? ) After configuring wandb account the training process started.

1) Assuming the training process has been completed (It shows 2000000 iterations ) I didn't understand how to test the trained model. (Edit: I have also waited for 300k / 2000000 iteration on CPU but baxter still cannot perform "hold" phase in task.)

2) Also, there are some checkpoint functions to save model periodically. But, when I retype the command again, does the trainer starts from beginning or starts from where it was interrupted? If both possible, how do we set this?

3) I will be very happy that if you can clarify what are the correct commands to train and visually test the special task for example one in furniture_cursor_toytable.py.

Thank you for your time

Best Regards

edwhu commented 4 years ago

Hello, sorry for the late response. That is interesting, do you have the unity binary installed? The connection error usually happens when the system cannot find the unity binary or if the display variable is incorrectly set so Unity cannot render to a display. What hardware / OS are you running this on? And what branch of the code? If you look at https://github.com/clvrai/furniture/search?q=os.environ&unscoped_q=os.environ, we manually set the DISPLAY variable to :1, but you may need to change it to :0 in rl.main .

If you add unity --False flag, it will not render the simulation in Unity, so you may not see any videos. The checkpoints are usually saved to the log directory. If you rerun the command with the same prefix, it will continue from the run.

Here is an example command for the furniture_cursor_toytable task:

python -m rl.main --wandb True --env FurnitureSawyerToyTableEnv --prefix ik4_256hidsize_01ent_hizdistrew_011z_fastckpt_gradnorm --max_grad_norm 20 --max_episode_steps 150 --grip_dist_rew 100 --site_dist_rew 100 --site_up_rew 20 --pick_rew 50 --gpu 2 --port 1090 --ctrl_penalty 0.00001 --aligned_rew 1 --notes "no constant aligned rew; topsite z offset 0.11; low grip up rew; 150 steps; 1e-2 entropy impedance" --algo ppo --virtual_display :1 --control_type ik --discretize_grip False --entropy_loss_coeff 1e-2 --rl_hid_size 256 --topsite_z_offset 0.11 --z_dist_rew 100

iltertaha commented 4 years ago

Hi, thanks for your reply, I am using dev branch. I've tried your example command to train. I've only changed --gpu 2 to --gpu 0. As I've tried before, --unity False with simple_block task still stores video samples, I have decided to run your example command with --unity False flag.

However, after training started, it fails and gives the following error:

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/research/furniture-dev/rl/main.py", line 121, in run(args) File "/home/research/furniture-dev/rl/main.py", line 70, in run trainer.train() File "/home/research/furniture-dev/rl/trainer.py", line 190, in train self._train_rl() File "/home/research/furniture-dev/rl/trainer.py", line 329, in _train_rl rollout, info = self._evaluate(step=step, record=config.record) File "/home/research/furniture-dev/rl/trainer.py", line 356, in _evaluate self._runner.run_episode(is_train=False, record=record) File "/home/research/furniture-dev/rl/rollouts.py", line 163, in run_episode if record: self._store_frame(env) File "/home/research/furniture-dev/rl/rollouts.py", line 214, in _store_frame frame = np.concatenate([frame, np.zeros((fheight, fwidth, 3))], 0) File "<__array_function__ internals>", line 6, in concatenate ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)

Output can be seen from the link : https://pastebin.pl/view/raw/b9e340c0

Then I decided to run simple block task in dev branch (not in master ) , I got same error by using this command:

python3 -m rl.main --env FurnitureBaxterBlockEnv --prefix demo --gpu 0 --reward_scale 3

I couldn't figure out what's going wrong? In which branch you run this code?

Besides, when I directly use your command it stucks at: [2020-03-22 14:23:51,925] Run 1 evaluations at step=1000 rl.FurnitureSawyerToyTableEnv.ppo.ik4_256hidsize_01ent_hizdistrew_011z_fastckpt_gradnorm.123: 0%|

Note: CUDA is working properly. wandb is set correctly. I use python3 without any error. (as I used in master)

I couldn't figure out what's going wrong and how to solve it. I am looking forward to hearing from you.

Thank you for your time. Best regards

edwhu commented 4 years ago

Are you using the Unity bundle to render? You may be using the MuJoCo rendering, which is not recommended since we do not test the MuJoCo rendering. I have just pushed a bug fix to fix a MuJoCo rendering bug to dev branch. You can try and see if it helps.

iltertaha commented 4 years ago

Thanks, after your comment, I redownloaded dev branch and now works fine.

I've also seen that there is an eval funct in trainer.py .

For ex: if I have trained model on simple block task and now using this learned agent I want to test performance on different block( different env) , what should I do?

edwhu commented 4 years ago

Hello, sorry for late response. You should specify which furniture to load using --furniture_name. There is also some config for loading saved checkpoints (init_ckpt_path, refer to rl/trainer.py).

iltertaha commented 4 years ago

Thanks.

clvrai / furniture

About Training and Testing #14