Closed chufanchen closed 7 months ago
Use multi-head SAC(actor, critic, exploration policy) on Continual World.
BC improves transfer in long-sequence scenario, but not in two-task scenario.
Regularize the critic deteriorates performance. The practical recommendation is to regularize only the actor.
Average performance
Forward transfer
Forgetting
ClonEx-SAC: behavioral cloning, improved exploration and SAC.
https://arxiv.org/abs/2209.13900