Visual observation: Is it a pre-trained network? Or will the weights be trained during a session?

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

16.93k stars 4.13k forks source link

Visual observation: Is it a pre-trained network? Or will the weights be trained during a session? #3163

Closed MarkTension closed 4 years ago

MarkTension commented 4 years ago

Hi all,

I'm trying to figure out when to exactly use a visual observation instead of a vector observation. In my case I just have a 36 x 36 grid of black and white pixels, with the agent moving around. Quite similar to the grid world environment.

Intuitively, It feels like using a visual observation will be better, since convolutional layers are quite optimal for learning a visual state-space, like in grid-world.

On the other hand, if the network comes pre-trained on e.g. natural images, then I don't think its learned weights generalize too well to my two-color grid environment.

Right now I'm just flattening the vector, and adding it as an observation.

Has anyone done some research on performance difference between vector observations and visual observations?

surfnerd commented 4 years ago

Hi @MarkTension, I feel like the general thoughts are that if you can use vector observations you may have an easier time training. Though there are scenarios where you may want to switch to visual observations. Perhaps @ervteng could provide some more input here.

ervteng commented 4 years ago

Hi @MarkTension, the network in ML-Agents isn't pretrained on any images. I think in your case a visual observation (of exactly 36x36 and not bigger) would perform better than a flattened vector obs, as the convolutional layers are better at learning spatial relationships between the pixels, as you've suggested.

chriselion commented 4 years ago

In particular, here's the code that setups up the different possible networks: https://github.com/Unity-Technologies/ml-agents/blob/298df2e122e7d9dcc5faed143a0c7934769a46d7/ml-agents/mlagents/trainers/models.py#L252-L418 (controlled by the vis_encode_type option in the trainer config)

MarkTension commented 4 years ago

Thank you for your input. I went about and tested it in the end. Indeed a little slower to train in the beginning, but it actually reaches a higher optimum! Green line is trained with visual observation of rendertexture, red has a vector observation. In time, it is a little slower to train because it has more parameters. Also, the higher optimum is is not because of having more parameters for visual observation, because when doubling number of hidden units for the vector observation run, it doesn't increase its performance.

Hope that helps anyone with the same question! Additionally, @ervteng , why did you say (of exactly 36x36 and not bigger)? Did you test a decreased performance with bigger pixel spaces? Thanks!

screenshot

Actually, later on in the training it reaches a much higher optimum even. ~ 1/3 higher cumulative reward.

ervteng commented 4 years ago

Hi @MarkTension, there's no reason why more than 36x36 won't work (it will work just fine) - but since your grid is 36x36, bigger resolutions will introduce redundant information that the CNN will have to learn to ignore, and it will take longer to train.

Glad to see it's working out!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.