Interpreting offline Behavioral cloning using.demo files

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

16.93k stars 4.14k forks source link

Interpreting offline Behavioral cloning using.demo files #2626

Closed rajatpaliwal closed 4 years ago

rajatpaliwal commented 4 years ago

Hi all, I am just curious about the learning process involved when student agent learns from .demo file in offline behavioral cloning. Suppose an agent needs to learn to navigate through a path while maintaining a particular orientation at certain places . The .demo files (i believe are like video files) of recording already contain the proper way of guiding through that path with correct orientation. I wonder does the student agent goes over the file again and again to learn of the behavior it needs to perform at a particular state.

Kindly clarify the process.

chriselion commented 4 years ago

Hi, The BehavioralCloning trainer will repeatedly use demo data to improve the model. You can see how the demonstration_buffer (which contains the loaded demonstration data) is used here: https://github.com/Unity-Technologies/ml-agents/blob/99e997c811176b163908f7c619b8aa9947e7e9b7/ml-agents/mlagents/trainers/bc/trainer.py#L126-L131

As has been mentioned in other issues, GAIL is now the recommended approach for imitation learning; I would suggest you switch to that if possible.

rajatpaliwal commented 4 years ago

Hi @chriselion
If I am saying it correctly the trainer goes over demonstration buffer again and again. Updates itself of the action that are taking place , creates mini batches to train the neural net and calculates the policy loss. Kindly correct me if I am wrong

chriselion commented 4 years ago

Yes, I believe that's correct.

rajatpaliwal commented 4 years ago

I believe same process will take place while using GAIL approach for imitation learning. Also, does the factors like speed and orientation of agent are also learned using .demo files? If yes, then how does the input of factors like speed and orientation is extracted from .demo files?

chriselion commented 4 years ago

First let's talk about the format of the demo files. The relevant source for writing them is here: https://github.com/Unity-Technologies/ml-agents/blob/8f4e6038c5e548061601f2323a3951a71c9ca2b7/UnitySDK/Assets/ML-Agents/Scripts/DemonstrationStore.cs#L58-L104

and for reading them at training time:

https://github.com/Unity-Technologies/ml-agents/blob/8f4e6038c5e548061601f2323a3951a71c9ca2b7/ml-agents/mlagents/trainers/demo_loader.py#L105-L124

So the format is 1) a metadata protobuf 2) the BrainParameters protobuf 3) several AgentInfo protobufs

The AgentInfo protobufs contain the observations that the agent made when the demo was recorded. So if these observations include the speed and orientation (or anything else) they will be accounted for in GAIL and BC.

rajatpaliwal commented 4 years ago

Thanks for the explanation. Everything makes sense now.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.