Closed MarkTension closed 4 years ago
Hi @MarkTension, I feel like the general thoughts are that if you can use vector observations you may have an easier time training. Though there are scenarios where you may want to switch to visual observations. Perhaps @ervteng could provide some more input here.
Hi @MarkTension, the network in ML-Agents isn't pretrained on any images. I think in your case a visual observation (of exactly 36x36 and not bigger) would perform better than a flattened vector obs, as the convolutional layers are better at learning spatial relationships between the pixels, as you've suggested.
In particular, here's the code that setups up the different possible networks: https://github.com/Unity-Technologies/ml-agents/blob/298df2e122e7d9dcc5faed143a0c7934769a46d7/ml-agents/mlagents/trainers/models.py#L252-L418
(controlled by the vis_encode_type
option in the trainer config)
Thank you for your input. I went about and tested it in the end. Indeed a little slower to train in the beginning, but it actually reaches a higher optimum! Green line is trained with visual observation of rendertexture, red has a vector observation. In time, it is a little slower to train because it has more parameters. Also, the higher optimum is is not because of having more parameters for visual observation, because when doubling number of hidden units for the vector observation run, it doesn't increase its performance.
Hope that helps anyone with the same question! Additionally, @ervteng , why did you say (of exactly 36x36 and not bigger)? Did you test a decreased performance with bigger pixel spaces? Thanks!
Hi @MarkTension, there's no reason why more than 36x36 won't work (it will work just fine) - but since your grid is 36x36, bigger resolutions will introduce redundant information that the CNN will have to learn to ignore, and it will take longer to train.
Glad to see it's working out!
This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi all,
I'm trying to figure out when to exactly use a visual observation instead of a vector observation. In my case I just have a 36 x 36 grid of black and white pixels, with the agent moving around. Quite similar to the grid world environment.
Intuitively, It feels like using a visual observation will be better, since convolutional layers are quite optimal for learning a visual state-space, like in grid-world.
On the other hand, if the network comes pre-trained on e.g. natural images, then I don't think its learned weights generalize too well to my two-color grid environment.
Right now I'm just flattening the vector, and adding it as an observation.
Has anyone done some research on performance difference between vector observations and visual observations?