Closed ghost closed 6 years ago
My understanding is that imitation learning is just a supervised learning. Take a racing car for instance, the inputs are images, and outputs steering angle, throttle, etc. You can use CNN to train your model. What is unknown to me is that how Unity ML agents can train the model in a short time.
@Xunzhaocunzi You are right, Imitation Learning the way we implement it is supervised learning. One trick used to make the agent learn faster is to reuse at every step a bunch of experiences randomly sampled from the training buffer. This means that experience points are used multiple times which makes training faster.
Why is the student agent the one that broadcasts the learning into python and not the teacher? I have a situation where I would like a single teacher agent to train the neural net (brain) which is later used for inference on similar agent.
@vincentpierre Which supervised learning algorithm you have implemented?
@RossMelbourne The student does not broadcast, it is set to external so python can control it. This is done so that it is possible to see how the student is doing while training. The Teacher is a Player brain with broadcast turned on. We currently only support collecting data and training simultaneously.
@Vlad1094 We use a simple NN. If you want to look at the code, the model is in python/unitytrainers/bc/models.py
and python/unitytrainers/models.py
. The observations go through dense layers and the output is an action (if discrete it is a probability distribution over actions and if continuous, it is a vector of continuous actions) the loss is the mean square difference between the target action (given by the player) and the action prescribed by the dense layers. It is trained using and experience buffer from which batches of experiences are sampled at each step.
I hope this helps.
@vincentpierre Thank you. It helps a lot. I write paper ot this topic and want to include some reference. Maybe you have some paper or article, more theoretical. To describe and proof correctness of this NN in more strict way. I'm so sorry to bother you.
@Vlad1094 The paper https://arxiv.org/pdf/1703.07326.pdf helped me a lot, getting a basic understanding of Imitation Learning.
If there is still a need for Imitation Learning papers, here is another one. :) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Englert_ABJ_2013.pdf
@FireDragonGameStudio Thank you 😊
Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue to discussion though.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What kind of algorithm is under the hood of Imitation Learning implementation? Could you share some theoretical material. Thank you in advance.