clvrai / skill-chaining

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization (CoRL 2021)
https://clvrai.com/skill-chaining
28 stars 4 forks source link

Question about the paper #2

Closed Jiayuan-Gu closed 2 years ago

Jiayuan-Gu commented 2 years ago

Hi @youngwoon,

Thanks for the nice work. I have a question about the design of your termination regularizer. If the initial set discriminator learns to distinguish the state perfectly, then the termination state of the current skill should be considered different from the initial state of the next skill, from the perspective of the discriminator. In this case, $R_{TSR}$ will be 0. But is it opposite to what is expected (the initial and termination states should be matched)? Can you help explain the idea behind it? Thanks.

youngwoon commented 2 years ago

Hi Jiayuan,

I think that's a fair concern that the discriminator will become too good at discriminating between the terminal states of one skill and initial states of the following skill. This is a well-known issue of GAN training (e.g. a discriminator says generated images are fake fairly well but with careful learning rate scheduling the generator can learn and eventually produce realistic images), which leads to unstable training in many cases. This is the same in our situation.

Our method is also using adversarial training, so our method sometimes suffers from unstable training. But, with careful tuning, we could make it work on long skill chaining.