Closed HagaiHargil closed 1 year ago
@HagaiHargil, personally I have not used TorchRL before so I'm not sure what changes are required to make it run, but if you only want to train a PPO agent on Miniworld environments, StableBaseline3 is fully supported.
If you want the environment to output a dictionary instead, you can just create a wrapper, see here for an example.
@HagaiHargil It should be possible for you to transform the observations from pixels
to observation
with a custom wrapper
Thanks for your comments, I was unfamiliar with StableBaseline, I'll try that out. I might go for the wrapper solution as well if necessary.
Question
I was trying to use some MiniWorld environment with TorchRL, which provides implementations for some of the more popular algorithms around, but I stumbled upon some issues. TorchRL expects specific key names in the returned results after each step, but these are incompatible with the key names used here.
For example, it looks for an "observation" key in the returned dictionary, but FourRooms responds with "pixels", which I believe is similar. If you try to change that expected key name it results in another error, when running
init_stats
(RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Byte
).Any advice on this? I'm mostly trying to use PPO and other more advanced policies with the MiniWorld environments.