-
Hi,
I'm re-running the LogP example using current version of PyTorch, and the execution stops in the reinforcement loop due to a TypeError, as below. Are you aware of any changes in PyTorch that co…
-
the following code generates an error in some of the most recent versions of `py-torch`: https://github.com/microsoft/oac-explore/blob/cbc0333cc9b616f6bbca9d6d9cdd37fd29ef55e7/trainer/trainer.py#L146-…
-
The implementation of the gradient update in faa_model.py seems to be very constrainted at best. It does not factor in the case where I want to obtain the policy for a finetuning model, it only naivel…
-
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
https://talkingaboutme.tistory.com/entry/RL-Policy-Gradient-Algorithms
https://www.telesens.co/2019/04/21/understa…
-
Hi,
I am kind of new to this OpenAI Gym.
While I was trying to run the test_policy_gradient.py file, I am getting the following error.
```
[2017-04-16 07:51:37,265] policy_gradient logger s…
-
Hi, I am a newcomer to drl. When I try to read trpo_step in trpo.py, I notice that you use a linesearch method instead of trust region for numerical optimization. So I want to know why you choose that…
-
hi, thank you for your brilliant work, i have got many from your work.
I'm ready to do related research, but i can't find the code of NeurIPS 2022 paper "The Policy-gradient Placement and Generativ…
-
-> Policy diverges quickly. As gradients have been fixed (hopefully), main suspects are probably one of these (or a combination):
- Policy learning rate & value function learning rate (currently 0.…
-
[Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
-
在您的 policy_gradient.py 文件中,请问 self.mu 需要乘多少是如何确定的呢?