-
While training, the number of frames used so far is computer as
`total_num_steps = (j + 1) * args.num_processes * args.num_steps`
Shouldn't this be multiplied by the number of stacked frames (def…
-
Hi p-christ,
Thanks for this amazing contribution. Recently, I tried the implementation of DDPG for MountainCar (with default parameters in results/Mountain_Car.py). However, the results are quite …
-
hello, how to evaluate the model? I use the test command from GDPL, got low success rate.
the command and outputs are below.
python main.py --test True --load model_rl/best > result.txt
DEBUG:r…
-
System Info
Describe the characteristic of your environment:
Describe how the library was installed: pip
sb3-contrib=='1.5.1a9'
Python: 3.8.13
Stable-Baselines3: 1.5.1a9
PyTorch: 1.11.0+cu102…
-
See https://github.com/pytorch/pytorch/issues/975 for more info
PyTorch TRPO appears 50% slower than TF. Not sure about PPO, but I expect the wall-clock time gap will be the same.
To fix this is…
-
# URL
- https://arxiv.org/abs/2312.16682
# Affiliations
- Jing Xu, N/A
- Andrew Lee, N/A
- Sainbayar Sukhbaatar, N/A
- Jason Weston, N/A
# Abstract
- Practitioners commonly align large langu…
-
Hello. I'm attempting to run learn.py on the hover test environment, and wondering if anyone has had any luck with this so far.
I admittedly haven't tried 1E12 training steps quite yet, but after …
-
Hello,
I just have a quick suggestion for the documentation:
Running the command
`Tensorboard --logdir=~\ray_results\PPO\editor\ --port=8008`
in a terminal while/after the program is tra…
-
**Is your feature request related to a problem? Please describe.**
We have posted a paper with codes [RRHF] (https://github.com/GanjinZero/RRHF) that can achieve human alignment without RLHF. RRHF ne…
-
first I want to thank you for your great share. It very rare to find trading reinforcement learning system with ppo.
I have an error when I run this code.
SInce i dont have talib installed i replace…