Closed HosseinSheikhi closed 3 years ago
Hi @HosseinSheikhi thanks for bringing this to our attention, that shouldn't be the case so there may have been an issue with the translate PR. @WendyShang can you check if your PR was incorporated correctly? Maybe it's using different hyperparams?
Hi @HosseinSheikhi, the plot of my run from scripts/cheetah_test.sh
is attached. Could you share your version of pytorch as well as your scripts in producing those diverged plots?
Thank you both for your prompt reply. The PyTorch version is 1.6.0 . Here it is the scripts (I run Cartpole pretty much like Walker, just changing the action repeat to 8):
CUDA_VISIBLE_DEVICES=0 python train.py \ --domain_name cheetah \ --task_name run \ --encoder_type pixel --work_dir ./tmp \ --action_repeat 4 --num_eval_episodes 10 \ --pre_transform_image_size 100 --image_size 84 \ --agent rad_sac --frame_stack 3 --data_augs translate \ --seed 23 --critic_lr 2e-4 --actor_lr 1e-3 --eval_freq 10000 --batch_size 128 --num_train_steps 500000
UDA_VISIBLE_DEVICES=0 python train.py \ --domain_name walker \ --task_name walk \ --encoder_type pixel --work_dir ./tmp \ --action_repeat 2 --num_eval_episodes 10 \ --pre_transform_image_size 100 --image_size 84 \ --agent rad_sac --frame_stack 3 --data_augs translate \ --seed 23 --critic_lr 1e-3 --actor_lr 1e-3 --eval_freq 10000 --batch_size 128 --num_train_steps 500000
@HosseinSheikhi For Cheetah Run, could you please try and let me know how the training curves look: CUDA_VISIBLE_DEVICES=0 python train.py \ --domain_name cheetah \ --task_name run \ --encoder_type pixel --work_dir ./tmp \ --action_repeat 4 --num_eval_episodes 10 \ --pre_transform_image_size 100 --image_size 108 \ --agent rad_sac --frame_stack 3 --data_augs translate \ --seed 23 --critic_lr 2e-4 --actor_lr 2e-4 --eval_freq 10000 \ --batch_size 128 --num_train_steps 600000 --init_steps 10000 \ --num_filters 32 --encoder_feature_dim 64 --replay_buffer_capacity 100000 \
I will update you, but just in case, pre_transform_image_size should not be greater than image_size?
Ah this may be the source of your issue. In your run you have --pre_transform_image_size 100 --image_size 84, but we need pre_transform_image_size < image_size, since the translate aug shifts the initially rendered image (which has size pre_transform_image_size) in a larger container (which has size image_size). This may be why your runs are blowing up.
Note: for crop image_size < pre_transform_image_size, since you're cropping a smaller image from the initially rendered one
On Thu, Oct 29, 2020 at 10:30 PM Hossein Sheikhi Darani < notifications@github.com> wrote:
I will update you, but just in case, pre_transform_image_size should not be greater than image_size?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MishaLaskin/rad/issues/9#issuecomment-719149827, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHWQWLQN2Q4TPEIMQUXIF3SNIXMRANCNFSM4TBXW7KQ .
Yes, that was the reason, now its converging. Thanks!
Hello, I wonder if I have to do fine-tunings to get results from Translate augmentation. It always diverges! I have tested for Cartpole, Walker, and Cheetah. In the following figures, the diverged one is Translate.
![rsz_screenshot_from_2020-10-27_20-25-48](https://user-images.githubusercontent.com/64957461/97387434-372eb880-1893-11eb-9f08-87c58d39acfa.png)