StepNeverStop / RLs

Reinforcement Learning Algorithms Based on PyTorch
https://stepneverstop.github.io
Apache License 2.0
449 stars 93 forks source link

about using gumbel_distribution to transform discrete space #20

Closed tanxiangtj closed 4 years ago

tanxiangtj commented 4 years ago

In the code you provided, the DDPG algorithm supports continuous and discrete action spaces by using the Gumbel_distribution. Maddpg is a DDPG-based extension, and whether it is suitable for discrete action spaces by using Gumbel_distribution. when i employ Gumbel in MADDPG, i can not obtain appropriate results. the version of tensorflow i used is 1.14, i don't use the tensorflow_probability module, could you give me some code exmples of Gumbel in TF, or give me some instructions? sorry to bother you.

StepNeverStop commented 4 years ago

I'm not sure about whether Gumble is suitable for MADDPG. Actually, maddpg in my repo does not work well for now, and I'll fix it next. You can find something related to Gumble distribution at here I'm not familiar with how to use Gumble with TF1.x, maybe you could find related answers in others' repo.

tanxiangtj commented 4 years ago

thanks for your replying. in the MADDPG code supplied by OpenAI (https://github.com/openai/maddpg/), they use Gumbel sample in distributons.py as follows: def sample(self): u = tf.random_uniform(tf.shape(self.logits)) return U.argmax(self.logits - tf.log(-tf.log(u)), axis=1) may be the MADDPG is able to address the discrete action spaces...

StepNeverStop commented 4 years ago

yes, Gumbel softmax is a good trick in solving gradient conduction discrete problem gradients.

Since you have found the solution, this issue will now be closed, feel free to re-open it.

tanxiangtj commented 4 years ago

when you employ Gumbel softmax in DDPG solving discrete action spaces problem, did you get the desired outcome?

StepNeverStop commented 4 years ago

@tanxiangtj yes, it works well with gym

tanxiangtj commented 4 years ago

您好!您在上海?您这个工作很不错,方便给个微信联系方式吗,向您请教请教

StepNeverStop commented 4 years ago

您好!您在上海?您这个工作很不错,方便给个微信联系方式吗,向您请教请教

如果您有什么问题,可以通过email或者在issue中联系我。

tanxiangtj commented 4 years ago

好的。谢谢!如果您解决了MADDPG在离散动作空间的应用,请指导我一下。