Closed AquaHorseM closed 1 month ago
Hi. The 'state' is the global state containing all information in the game, that makes it a valid MDP. The observation of each agent should follow your environment design. Hope it helps.
Thank you for your reply! Sorry for another bothering here: how can I most conveniently modify the algorithms (HASAC for example) into offline training? Is there any suggestion?
These algorithms are not inherently designed for offline settings, so they do not curate large offline datasets or have conservative training constraints. I think you may need to design new algorithms and construct offline datasets to achieve offline training.
Okay thanks! Actually I am only looking for a way to apply the algorithm to existing datasets and I found where it should be done in the codebase. I would try it myself. Thank you anyway. Moreover, I found such a definition here: It seems like the returning 'state' of the 'step' function should be the shared observations? Sounds weird but it makes sense according to the code. I would try the implementations to see if they work, and hope there would be a more precise guide on how to adapt the algorithms to an own environment (or I would like to contribute to one if I succeed). Again, thanks for your work!
Yeah I understand, share_obs
may sound confusing but it is the state of the environment. It's just a naming convention I followed from MAPPO codebase. Also, your contributions are welcome.
I'm trying to fit my own env into the gym to use the algorithms, however the step() function needs to return a 'state'. Is this the global state that contain all the information in the game, or the shared observation that is visible to all agents? If it is the former, should I put the shared observation into each agent's own returning observation?