How to train the RL policy through SB-3 MlpPolicy?

StanfordVL / OmniGibson

OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx

https://behavior.stanford.edu/omnigibson/

MIT License

463 stars 51 forks source link

How to train the RL policy through SB-3 MlpPolicy? #697

Closed wzjscut closed 2 months ago

wzjscut commented 6 months ago

I hope to write a reinforcement learning example by referring to the method in example/learning/navigation_policy_demo.py. I wish to use MlpPolicy to train the policy. But I found that the observations defined in env_base.py are all in Dict format. Is there any way to convert it to Box format?

ChengshuLi commented 2 months ago

The task-relevant observation are already flattened and they can be processed by MLPPolicy, i.e. obs["task"].

Other observations are from onboard sensors, e.g. rgb, depth, etc., which generally speaking should not be flattened. You should consider using different encoders to process them separately, e.g. Conv2DPolicy.