-
Remember that one benefit of policy gradient over Q-learning is that it can learn a stochastic policy. and then we don't have to finetune the exploration during training.
note that if we use softma…
-
Hello
I am grateful to you for sharing your code
Could you please upload papers related to your code?
-
can you give me some advice to run this code buy gpu and render the window from the original envs ?
thanks
-
Hi
I was trying to run ' python a3c_main.py --evaluate 2 --load saved/pretrained_model' to run inference using the pre-trained model. However, I faced the following dimension error without changing…
-
I ran the notebook without any changes on the vizdoom environment. After around an hour the reward became non-negative and peaked at around 0.7, but continuing to run the code resulted in the reward g…
-
For example when I run a2c.py -r "runs/a2c/a2c_cartpole.ini" tons of errors pop up.
Regardless I like that you've implemented a lot of algorithms and put them here. It's very useful for someone new…
ghost updated
5 years ago
-
This error happened on **CARLA server** when I use leaderboard and scenario runner to create my A3C training environment. Strangely, it appeared a few hours after the start of training. Does anyon…
-
Do you have a version of the code in Python 3.x + TensorFlow 2.x? This'll help me run on a platform that does not have Python 2.7 + TensorFlow 1.1.0.
-
I have a custom environment where the total reward is the sum of intrinsic reward and environmental reward.
I've configured the environment to emit the reward breakdowns as:
`info = {'agent0' : {'…
-
Hi alifanov, Thanks for giving out your code, it's a very good example.
I try to train simple-EC , But I feel very slowly . Doesn't it take into account more CPU to train EC by synchronous ?
T…