-
理论上这个是可行的,但我不知道这个在paddle parl这里怎么写,不知可否官方实现下:
我能找到的参考资料有:
https://stackoverflow.com/questions/56226133/soft-actor-critic-with-discrete-action-space
https://www.spaces.ac.cn/archives/6705
感谢。
…
-
Hi, I was thinking of incorporating the action (in addition to state) as a state-action pair input into the [rainbow dqn](https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/mas…
-
Hello
I readed Soft actor critic article and in soft policy iteration section, I saw policy improvement step which used Kullback-Leibler divergence. Does soft actor critic for discrete actions also u…
-
This was assigned to me by @wakeuplearn
https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665
Spinning Up Docs are also good: https://spinningup.openai.com/en/latest/algorithms/…
-
## 🐛 Bug
I encountered this error that (I think?) is a version mismatch error:
`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [tor…
-
Hello,
Has anyone tried using image as input to train the network? I have worked that for couple of days using a 3 layers conv net to process image substituted original low-dimensional states, but…
-
Dear Author,
I take a fast look at your code on actor updates. It seems that you have use advantage soft actor critic, i.e.,
Advantage: `pol_target = q - v`
loss: `pol_loss = (log_pi * (log_pi /…
-
# OSR Dimensions | Stellarios
Dimensions for SpaceChallenge.tech
[https://acord-robotics.github.io//stellarios/awsjplosrdrl-dimensions/](https://acord-robotics.github.io//stellarios/awsjplosrdrl-dim…
-
hi,
To perform the SAC (soft actor and critic) algorithm combined with Tensorflow2.0 and Pyrep, one tricky thing is that we have to create two different environments: one for collection and one fo…
-
I am unable to find any place in the code, where Q1 and Q2 would be used asymmetrically, which makes me think: is Q2 completely redundant?