-
I need that algorithm implemented here!!!
-
-
-
-
Hi, I am new to tianshou and RL. I created a env and used ppo in tianshou to run. But I found the action sampling is out of range. So I searched for, and I found map_action. But it seem not used in tr…
-
# Learning to play Yahtzee with Advantage Actor-Critic (A2C) | dionhaefner.github.io
My in-laws are really into the dice game Yatzy (the Scandinavian version of Yahtzee). If you’re unfamiliar with th…
-
Comments for https://www.endpointdev.com/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/
By Kamil Ciemniewski
To enter a comment:
1. Log in to GitHub
2. Leave a comment on this issue…
-
I want to make a project using reinforcement learning in which a bot send scam to other bots on social media, other bots detect the scam and reject it.
I think it needs a deep reinforcement learning…
-
Hello!
I noticed that the maximum eposides can be controlled by MAX_EPISODES during training, and EVAL_INTERVAL determines the evaluation intervals; however, the evaluation process seems to determi…
-
Here are my situation:
1. finished step 2 with cohere/zhihu_query dataset. The final reward score is 5.07, rejected score is 0.8, and the acc is 0.79. So the step 2 seems sucessful.
2. when I atte…