-
Dear author,
Thank you very much for making such a great project, it is very helpful to my research. But your codes and functions are too many, I don't know how to start, can you help me get starte…
-
-
Within Tribler we aim to calculate and show trust level of our neighbors.
Trust levels evolve over time, as more information is coming in from our blockchain. Restarting the calculation from cratch…
-
First, thanks for the amazing repository! I wanted to load a pretrained model from Huggingface, which typically creates a folder with the `config.json` and the .bin file containing the weights inside.…
-
Hi. Thank you for the great work.
### Describe the bug
I am trying to train a virtual robot with multiple mimic joints. More specifically like this: https://github.com/KKSTB/isaac_lab_gundam_rob…
KKSTB updated
1 month ago
-
Hello @miyosuda,
Thanks for sharing the code, please ignore the title, I tried out your code with the control problem of cartpole balance experiment instead of Atari game, it works well. But few ques…
-
todo: add KL penalty between current and marginal policy as an intrinsic reward/penalty
log π(a|s)/p(a)
the question is if this will induce perseveration
the only thing to figure out is how to …
-
There are several optimizations to our PPO recipe which could help push it closer to SOTA in terms of performance. There are also several pieces of documentation we could offer alongside this recipe t…
-
Also, is your code based on the paper with new modifications, the code involves A2C-like strategies that don't seem to be presented in the paper, which is a bit unclear to me. I hope you can help.
-
Hello,
We recently fixed a bug in the ppo2 implementation that should solve the performance gap observed ;)
So I recommend you to update to latest version. Btw, I'm quite interested in your benchmar…