How to perform imitation learning using player and student brain ?

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.18k stars 4.16k forks source link

How to perform imitation learning using player and student brain ? #2375

Closed rajatpaliwal closed 5 years ago

rajatpaliwal commented 5 years ago

In my custom made environment I have mapped the agent movement to keyboard keys . I want to use the broadcast feature to collect data generated by Player Brain game sessions and use this data to train an agent in a supervised context. Can some provide me the steps to perform above mentioned action?

awjuliani commented 5 years ago

Hi @rajatpaliwal

It sounds like you'd like to do online imitation learning. We have documentation on this here: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Imitation-Learning.md#online-training

Please let us know i this works for you.

rajatpaliwal commented 5 years ago

Thanks @awjuliani for such prompt reply. I have two question - 1) To implement this method do I need to create two separate agents map them separately to the teacher and student brain? 2). While creating the agent do I need to setup reward for the agent or the simple mapping of the agent action to the keyboard buttons will do the job?

awjuliani commented 5 years ago

Hi @rajatpaliwal

In this case you would use two separate agents with separate brains. You don't need to set up rewards, if you only plan to use imitation learning.

As a note, you can check out the "BananaIL" scene which demonstrates online imitation learning: https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Examples/BananaCollectors/Scenes/BananaIL.unity

rajatpaliwal commented 5 years ago

Thanks @awjuliani . Much appreciate your suggestions.

rajatpaliwal commented 5 years ago

Hi @awjuliani I am trying to perform imitation learning in my custom made environment. But, I am not able to edit the discrete inputs for my actions in player brain. Any thoughts.

rajatpaliwal commented 5 years ago

Sorry for the bother @awjuliani . It was some problem with my Unity . Restarting the project solved it.

rajatpaliwal commented 5 years ago

Hi @awjuliani , While trying to perform online imitation learning in my custom made environment, as I give training command (" mlagents-learn config/online_bc_config.yaml --train --slow") in the command prompt it prompts me to press the play button. Since my environment is heavy it takes some time to start after pressing the play button , in the meantime command prompt gives me the error" The Unity environment took too long to respond". Any suggestion on increasing the wait time of the training command so that I can start the training of the environment.

rajatpaliwal commented 5 years ago

Hi @awjuliani , Does the student agent learns from scratch in a particular iteration of training or does it also retains some learning from previous iteration of training while learning from current iteration of training.

awjuliani commented 5 years ago

Hi @rajatpaliwal

The student agent continues to learn from data collected in the past as well as data being immediately collected in each training iteration.

rajatpaliwal commented 5 years ago

Hi @awjuliani , After each iteration of training a separate .nn file is created which is imported into the learning brain after the training is done. Can you elaborate how student agent continues to learn from data collected in the past when a separate .nn file is being created with every iteration. Ideally student agent should learn from scratch from the current iteration .

awjuliani commented 5 years ago

Hi @rajatpaliwal

The .nn file is the final product of learning. It does not change once created.

rajatpaliwal commented 5 years ago

Hi @awjuliani . I absolutely agree that .nn file is the final product of learning. That is why I was confused how student agent which is controlled by the learning brain continues to learn from data collected in the past. I think what you mean by that statement is using RNN's to retain previous learning or using command line option --load for loading the parameters of previously trained brain. Kindly correct if I am wrong.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.