Closed c4cld closed 3 years ago
Hi! Thank you for the question. The paper mentions that Distral learns a distilled policy in the space of policies, which is better than learning in the space of parameters. The basic idea is, it should be easier to interpolate in the space of functions/policies than interpolating in the parameter space. Two networks could have very similar predictions on any given input but have very different weights. Averaging their predictions would give meaningful predictions, while averaging their weights could result in a worse model than the original two models. This is also related to how we perform ensembling- we average the predictions of multiple models and not average the weights of different models.
The paper does not comment about star/centroid being better. If you think otherwise, could you please point me to the relevant line in the paper?
@shagunsodhani Thank you for you selfless help. The paper mentions star/centroid in 2.3 Policy Gradient and a Better Parameterization. Could you know the advantages of the centroid and star-shaped structure and explain them in detail?
One advantage could be the reduced computation, every model distills with the a central model so the number of distillation operations is linear in number of models. With say a fully-connected topology, the number of operations will be quadratic. This has the obvious limitation that the information exchange is bottlenecked on the central model.
@shagunsodhani Thank you very much!
Cool - closing the task - feel free to reopen if needed :)
Description
In 2.3 Policy Gradient and a Better Parameterization of 'Distral: Robust Multitask Reinforcement Learning', the author argues that the centroid and star-shaped structure of DisTraL is good for learning a better distilled policy. However, the explanation is too simple. Could you know the advantages of the centroid and star-shaped structure and explain them in detail? I have tried to contact the author of the paper, but I haven't received a reply. So I come to consult you.