issues
search
Khrylx
/
PyTorch-RL
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k
stars
185
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Failed on " HopperEnv' object has no attribute 'seed' "
#37
HughesField
opened
4 months ago
0
Why does GAIL get lower rewards the more it is trained?
#36
ZXAXKL
opened
1 year ago
1
TRPO,Is fixed_log_probs the same as log_probs
#35
yongpan0715
closed
2 years ago
1
Is the implelented performance comparable with the results provided in original GAIL paper?
#34
huang-fuxian
opened
2 years ago
0
A question bout PPO implementation
#33
pengzhi1998
closed
2 years ago
0
About computing Hessian*vector
#32
jjjhfffjj
opened
2 years ago
0
What's Conjugate gradients and line_search in TRPO?
#31
Dreamlikec
opened
3 years ago
1
What's Conjugate gradients and line_search in TROP?
#30
Dreamlikec
closed
3 years ago
0
Is this repository only work for Gym Environments?
#29
XueminLiu111
opened
3 years ago
0
Fail to train of GAIL in Ant-v2 environment
#28
seolhokim
opened
3 years ago
3
Implementation problem
#27
pengzhi1998
closed
3 years ago
6
Various questions?
#26
lviano
opened
3 years ago
1
How are we using rewards in imitation learning?
#25
SiddharthSingi
closed
3 years ago
4
Mountain Car
#24
jpark0315
opened
4 years ago
0
Doubt regarding the calculation of advantage
#23
nesarasr
closed
4 years ago
2
Question on multiprocessing
#22
pengzhi1998
closed
4 years ago
1
about the kl
#21
yangyiqin-tsinghua
closed
4 years ago
3
is this an error:num_steps += (t + 1) ?
#20
pprivulet
closed
4 years ago
1
Fuck
#19
kaelgabriel
closed
4 years ago
0
GAIL discriminator loss uses complete expert data in each iteration?
#18
SapanaChaudhary
closed
4 years ago
4
question about A2C
#17
kishanpb
opened
4 years ago
0
Confusion about advantage computation
#16
gunshi
closed
5 years ago
0
Example for Continued PPO training after GAIL?
#15
signalprime
opened
5 years ago
0
question about weight init
#14
gunshi
closed
5 years ago
2
Inconsistent action shape when running CartPole-v1
#13
truongthanh96
closed
5 years ago
1
Not able to run the TRPO example on GPU
#12
avijit9
closed
5 years ago
1
TRPO: KL Divergence Computation
#11
sandeepnRES
closed
5 years ago
1
Few Runtime errors
#10
sandeepnRES
closed
5 years ago
1
Entropy Term for GAIL
#9
sandeepnRES
closed
5 years ago
2
CNN Policy
#8
bbalaji-ucsd
opened
5 years ago
1
result is not good
#7
ghost
closed
6 years ago
2
About the computation of Advantage and State Value in PPO
#6
mjbmjb
closed
6 years ago
2
Concatenation of memories with not terminated episode
#5
lcswillems
closed
6 years ago
1
Training a recurrent policy
#4
erschmidt
opened
6 years ago
4
Autograd Import Error
#3
aseembits93
closed
6 years ago
3
Memory leak during GPU training
#2
erschmidt
closed
6 years ago
2
CudnnRNN is not differentiable twice
#1
erschmidt
closed
6 years ago
4