trpo Search Results - Githubissues

783 results
for trpo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

garymcintire/mpi_util #2

CalledProcessError: Command '['avconv', '-version']' returne…

Hi all, I am excited about the repo but met this error after running the train.py. Any idea or thoughts on that will be appreciated. I'm running in 3.16.0-4-amd64, and python 3.5.2 Thanks, [20…

FishQian updated 7 years ago
4
hill-a/stable-baselines #140

LSTM policies are broken for PPO1 and TRPO

See the `feature/fix_lstm` branch for [a test](https://github.com/hill-a/stable-baselines/blob/feature/fix_lstm/tests/test_lstm_policy.py) which [fails](https://travis-ci.com/hill-a/stable-baselines/b…

ernestum updated 4 years ago
10
wangshusen/DRL #13

TRPO中的一个小问题

![微信截图_20210401182829](https://user-images.githubusercontent.com/13194870/113281468-2d185680-9318-11eb-90de-e676e6c521c0.png) 这里面相当于用theta_now产生的轨迹来估计Q_phi(s,a) 是不是有点问题？ Q_phi的参数是theta还是theta_now…

kli-casia updated 3 years ago
6
IBM/rl-testbed-for-energyplus #84

TRPO and PPO Models don't train

@antoine-galataud @takaomoriyama Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised e…

yashviagrawal updated 2 years ago
4
openai/baselines #1008

No Monitor Files for TRPO and DeepQ

Hello, executing `python -m baselines.run --alg="deepq" --env="QbertNoFrameskip-v4" --num_timesteps="1e4" --log_path="~/logs/"` will not produce any monitor.csv files. Only trpo and deepq are affect…

Lantc26 updated 5 years ago
2
openai/gym #3284

[Question] Changing action space with time/episode

What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity? Action space is continuous. Typical case in Mujoco Ant…

prinshul updated 2 months ago
1
nosyndicate/pytorchrl #2

TRPO code is not performing well compared to Other implement…

Pearlmutter method only gives "good" value of hessian vector product in the first two iterations in conjugate gradient loop

nosyndicate updated 6 years ago
1
rll/rllab #150

A lot of scripts aren't working in example folder

All of the scripts trpo_gym_tf_cartpole.py, trpo_gym_tf_cartpole.py, trpo_cartpole_pickled.py, ddpg_cartpole.py stopped working just after the start with the same error message: ``` rllab\examples>…

ViktorM updated 7 years ago
4
Geonhee-LEE/rl-collision-avoidance #5

Implement RL algorithms

- Value based RL - [ ] DQN - [ ] Rainbow DQN - [ ] [CQL](https://sites.google.com/view/cql-offline-rl) - Value based + Policy based RL - [x] DDPG - [ ] [TD3](https://spinni…

Geonhee-LEE updated 4 years ago
5
long8v/PTIR #154

[142] Trust Region Policy Optimization

[paper](https://arxiv.org/pdf/1502.05477.pdf) ## TL;DR - **I read this because.. :** CS285 기말과제 - **task :** reinforcement learning - **problem :** 이론적으로 무조건 성능이 개선되는 policy update 방식이 있을까…

long8v updated 3 months ago
1

上一页 1...2 3 4 5 6 7 8...79 下一页

783 results for trpo

783 results
for trpo