Open flrngel opened 6 years ago
Hi @flrngel ,
If you'd like to play around with code, here is a public implementation: https://github.com/haarnoja/sac/blob/master/DIAYN.md
Here are answers to your questions:
@ben-eysenbach I never expected author would find this and comment my question. thank you!
https://arxiv.org/abs/1802.06070
Abstract
1. Introduction
2. Related Work
Paper says that maximizing diversity is better than specific reward on complex behaviors
3. Diversity is all you need
3.1. How it works
H[a|s] = MI(a,z|s) from continuous action space
F(Θ) = H[a|s,z] + H[z] - H[z|s]
3.2. Implementation
4. What skills are learned?
(alpha with 0.01 is best discriminative illustration)
Question