chaobiubiu / DDMA

Code for the paper "DDMA: Discrepancy-Driven Multi-Agent Reinforcement Learning" presented at PRICAI 2022.
3 stars 1 forks source link

paper #1

Open Xiyou521 opened 1 year ago

Xiyou521 commented 1 year ago

Hello, I can't search your paper, has it been published yet? If possible, can you send a link to the paper. thank you.

Xiyou521 commented 1 year ago

Hello, I ran your code in Collision Under the DDMA algorithm in the Corridor environment, select The action always outputs the label. The action selection is 0, and the reward is also 0. I think other people's MADDPG algorithm is select_ Action (self, o, noiserate, epsilon), parameter noise The rate has become epsilon. Is there a problem with this part? Thank you very much if you can answer.

chaobiubiu commented 1 year ago

Hello! Thanks for your attention to our work. This paper are in preprint and I will post it here when it appears. For the Collision Corridor, we achieve DDMA based on MAPG, which equips each agent with a centralized critic and an actor from which the discrete actions are sampled. For MPE environments, we achieve DDMA based on MADDPG that outputs the continuous actions. Looking forward to your response if you have any further concerns.

Xiyou521 commented 1 year ago

Thank you very much for your reply and look forward to the publication of your paper.