Open DavidSlayback opened 1 year ago
Hi @DavidSlayback! Thank you so much for raising this, the suggestions and the kind words. And please excuse the late response. These are all good and valid points. What POMDP environments have you been working with? I am generally open to adapting the API and adding more (somewhat classic) environments to gymnax
. It is a bit of fine line, since I also want to circumvent blowing up the package too much and each new env will require some testing against a numpy version. With regards to your three points:
get_obs
and terminal
. Alternatively, one could absorb the action into the state
, but I think treating it as a separate input is cleaner.My hands are currently tied up with my internship, but feel free to open a PR! I would be happy to merge it in if all the unit tests pass. Also feel free to open PRs for your environments. I would love to see what you have been brewing up.
Most of my POMDPs are the classic ones from POMDP literature as seen here. Things like Tiger, HeavenHell, RockSample, Hallway. I also have a few of my own like a multistory fourrooms variant (with various observation functions), partially-observable Taxi, some modifications of continuous control domains, etc...I'm specifically interested in ones that require extremely long-term memory and reasoning.
I'll definitely look at doing some PRs for some of the more "classic" ones then! I think an option for different observation functions for already-implemented environments provided on environment creation (like you already have in your FourRooms domain) could be a good way to expand some of these without expanding the repository too much
Hey, first off, I love this project and the general idea of defining environments in JAX so that they can be easily batched and integrated into RL training loops!
I tend to do a lot of work with POMDPs and have built a few branches in my own fork that implement various POMDP environments. It works fine for my purposes, but I've run into a couple instances where I just ignore the base Environment API
Specifically:
Obviously I'm just overriding the methods with the extra arguments as needed for my own environments, but some of this might be common enough to justify a different base API?