In the paper you provide, it is stated that "Each agent i follows a shared policy". However, in the codebase, I only found implementations that resemble MAPPO's "SeperatedBuffer" and "SeperatedRunner", which are designed for non-parameter-sharing scenarios. This might cause a discrepancy in performance if they are not consistent. I would like to know whether the codebase only supports non-parameter-sharing MAPPO at the moment, or if I have overlooked something.
In the paper you provide, it is stated that "Each agent i follows a shared policy". However, in the codebase, I only found implementations that resemble MAPPO's "SeperatedBuffer" and "SeperatedRunner", which are designed for non-parameter-sharing scenarios. This might cause a discrepancy in performance if they are not consistent. I would like to know whether the codebase only supports non-parameter-sharing MAPPO at the moment, or if I have overlooked something.