Closed ruleGreen closed 3 years ago
I checked the code and near no change but only set reward type as DISC and I checked the provided log, most also under 0.2 when reaching 0.6~0.8 success rate.
Thank u for your quick response. Like I posted above, I find that PPO + GAIL works better than the paper, not worse. and I do not change anything except set the reward type as DISC. so I am a little confused. and wondering is there something wrong? At the last, Thank u for your work. nb
The performances of GAIL and AIRL are quite tricky. They all rely on TF to achieve relatively high success rates. But as I mentioned in the paper, TF without RL can achieve higher scores than GAIL and AIRL. This is also why we want to get rid of adversarial RL.
Yes, You are right. But I am supposed even with TF, PPO + GAIL can not get such high success rate within 10k frames, like 0.6-0.7 success rate.
For PPO(human), I set the reward type as OFFGAN, and self.pretrain_finished will be False(row 176 ppo.py), and the log should appear warm up value net, but why your log does not appear this one?
Sorry, I didn't really get your question. Do you mean why I didn't record the values of the value-net in the shared logs? The experiments for PPO(human) and PPO(offgan) were finished early, and we didn't have a value-net warm-up for these two agents. This was also the setup for GAIL and AIRL at the beginning. But we found it was almost impossible to have stable training for GAIL and AIRL. Then we incorporate this value-net warmup for these two agents during the first 2k frames. Since the only difference is if we warm-up the value-net for the first 2k frames, this will not affect the overall performance. Of course, you can warm-up the value net for PPO(human) and PPO(offgan) as well, just like the current code.
Hello, I am confused about the different settings with different methods. I find that the converging speed with the various methods is different from your paper. Can u share more details about the experiments? like which method needs a warmup? and how do u set the max warmup epoch?