understanding Visual Observation

Guidosalimbeni commented 5 years ago

3D ball example what if with visual observation

As far as I understand visual observation are extracted in ml-agents using a CNN. I understand that CNNs are very good at extracting features from an image but not very good at remembering the position of the pixels in the image (which is part of is its own strength for classification). Question: If, for example, I would use visual observation to the 3Dball env in ml-agents can I expect very bad performance? Can I assume that the visual observation, since it is composed of CNN layers, would not give enough information for the agent to rotate the platform and balance the ball? On the other end, I think the famous atari BreakOut challenge used CNN for observation, so I must be completely wrong.

I am a bit confused and any response that point me in the right direction would be really appreciated regards Guido

ervteng commented 5 years ago

CNNs do encode positional information. However, in applications such as image classification, it is undesirable to have this information, and we force the network to be translation invariant through data augmentation and/or pooling. We don't do that in ML-Agents, so the CNN will know the position of the pixels.

Guidosalimbeni commented 5 years ago

thank you!!

xiaomaogy commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion though.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

understanding Visual Observation #2301