Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.18k stars 4.16k forks source link

Observation State encoder? Autoencoder? #2422

Closed FlimFlamm closed 5 years ago

FlimFlamm commented 5 years ago

I'm interested in finding creative ways to "encode" observation states as a means of overcoming the "curse of dimensionality" (in the context of one or more networks encoding their own observational data before sending it to another).

From what I know about auto-encoding, most of the low-hanging functionality comes in the form of CNN like networks, where one half does basic feature extraction (it compresses many inputs into fewer inputs) and then the other half (which is like a reverse CNN) tries to reconstruct the original observations from the "encoded" layer in the middle (the choke point in the network where the data is most highly compressed).

Normally we would add a "head" (a fully connected back propagation layer or two) to actually recognize (categorize) higher level observations coming out of the CNN "encoder". The reason why the CNN is useful (the reason why we don't just use a thicker fully connected back propagation network) is actually because there are often too many inputs (resulting in a super massive overall network which is computationally unsuitable). In other words, the CNN does some rough pre-processing and summarizing of relevant patterns that exist in observational data, and the "head" network can then work from the high level recognized patterns (albeit encoded) with much higher efficiency.

Is there presently any way to implement observational encoding within the MLAgents toolkit? I'm working on a project that is starting to suffer from input quantity ("the curse of dimensionality", and this could be a very intelligent approach to overcoming it. I attempted to implement an auto-encoder with two PPO networks, but because the networks themselves have to be separate, the back propagation phase occurs separately for each of them (which is AFAIK a crucial step in training autoencoders). The problem is that the encoder side doesn't know the ideal outputs (its outputs are the encoded observations, which is what it is trying to learn).

I considered trying to jerry-build some kind of system that uses camera inputs to accomplish this goal, but it's likely too complicating. I'm not exactly a brilliant programmer, or experienced with Unity or C#, so any insight or advice anyone can offer would probably be helpful.

ervteng commented 5 years ago

Hi @FlimFlamm, it's definitely doable without touching the C# or Unity side. You'd have to be pretty familiar with TensorFlow, though, and modify the models.py to create an auto encoder instead of the usual visual encoder. Then you'll have to add a loss for the encoding that's some correlation between the input and the output of the decoder, and add it to the PPO loss.

If there's an existing implementation, you might also be able to use our gym_unity wrapper to interact with your Unity project instead of the PPO trainer.

chriselion commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion though.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.