PKU-MARL / HARL

Official implementation of HARL algorithms based on PyTorch.
521 stars 64 forks source link

A tiny confusion about the ;state' concept #51

Closed AquaHorseM closed 1 month ago

AquaHorseM commented 2 months ago

I'm trying to fit my own env into the gym to use the algorithms, however the step() function needs to return a 'state'. Is this the global state that contain all the information in the game, or the shared observation that is visible to all agents? If it is the former, should I put the shared observation into each agent's own returning observation?

Ivan-Zhong commented 2 months ago

Hi. The 'state' is the global state containing all information in the game, that makes it a valid MDP. The observation of each agent should follow your environment design. Hope it helps.

AquaHorseM commented 2 months ago

Thank you for your reply! Sorry for another bothering here: how can I most conveniently modify the algorithms (HASAC for example) into offline training? Is there any suggestion?

Ivan-Zhong commented 2 months ago

These algorithms are not inherently designed for offline settings, so they do not curate large offline datasets or have conservative training constraints. I think you may need to design new algorithms and construct offline datasets to achieve offline training.

AquaHorseM commented 2 months ago

Okay thanks! Actually I am only looking for a way to apply the algorithm to existing datasets and I found where it should be done in the codebase. I would try it myself. Thank you anyway. Moreover, I found such a definition here: image It seems like the returning 'state' of the 'step' function should be the shared observations? Sounds weird but it makes sense according to the code. I would try the implementations to see if they work, and hope there would be a more precise guide on how to adapt the algorithms to an own environment (or I would like to contribute to one if I succeed). Again, thanks for your work!

Ivan-Zhong commented 1 month ago

Yeah I understand, share_obs may sound confusing but it is the state of the environment. It's just a naming convention I followed from MAPPO codebase. Also, your contributions are welcome.