Khrylx PyTorch-RL issues

Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

MIT License

1.09k stars 185 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Failed on " HopperEnv' object has no attribute 'seed' "

#37 HughesField opened 4 months ago
0
Why does GAIL get lower rewards the more it is trained?

#36 ZXAXKL opened 1 year ago
1
TRPO,Is fixed_log_probs the same as log_probs

#35 yongpan0715 closed 2 years ago
1
Is the implelented performance comparable with the results provided in original GAIL paper?

#34 huang-fuxian opened 2 years ago
0
A question bout PPO implementation

#33 pengzhi1998 closed 2 years ago
0
About computing Hessian*vector

#32 jjjhfffjj opened 2 years ago
0
What's Conjugate gradients and line_search in TRPO?

#31 Dreamlikec opened 3 years ago
1
What's Conjugate gradients and line_search in TROP?

#30 Dreamlikec closed 3 years ago
0
Is this repository only work for Gym Environments?

#29 XueminLiu111 opened 3 years ago
0
Fail to train of GAIL in Ant-v2 environment

#28 seolhokim opened 3 years ago
3
Implementation problem

#27 pengzhi1998 closed 3 years ago
6
Various questions?

#26 lviano opened 3 years ago
1
How are we using rewards in imitation learning?

#25 SiddharthSingi closed 3 years ago
4
Mountain Car

#24 jpark0315 opened 4 years ago
0
Doubt regarding the calculation of advantage

#23 nesarasr closed 4 years ago
2
Question on multiprocessing

#22 pengzhi1998 closed 4 years ago
1
about the kl

#21 yangyiqin-tsinghua closed 4 years ago
3
is this an error：num_steps += (t + 1) ？

#20 pprivulet closed 4 years ago
1
Fuck

#19 kaelgabriel closed 4 years ago
0
GAIL discriminator loss uses complete expert data in each iteration?

#18 SapanaChaudhary closed 4 years ago
4
question about A2C

#17 kishanpb opened 4 years ago
0
Confusion about advantage computation

#16 gunshi closed 5 years ago
0
Example for Continued PPO training after GAIL?

#15 signalprime opened 5 years ago
0
question about weight init

#14 gunshi closed 5 years ago
2
Inconsistent action shape when running CartPole-v1

#13 truongthanh96 closed 5 years ago
1
Not able to run the TRPO example on GPU

#12 avijit9 closed 5 years ago
1
TRPO: KL Divergence Computation

#11 sandeepnRES closed 5 years ago
1
Few Runtime errors

#10 sandeepnRES closed 5 years ago
1
Entropy Term for GAIL

#9 sandeepnRES closed 5 years ago
2
CNN Policy

#8 bbalaji-ucsd opened 5 years ago
1
result is not good

#7 ghost closed 6 years ago
2
About the computation of Advantage and State Value in PPO

#6 mjbmjb closed 6 years ago
2
Concatenation of memories with not terminated episode

#5 lcswillems closed 6 years ago
1
Training a recurrent policy

#4 erschmidt opened 6 years ago
4
Autograd Import Error

#3 aseembits93 closed 6 years ago
3
Memory leak during GPU training

#2 erschmidt closed 6 years ago
2
CudnnRNN is not differentiable twice

#1 erschmidt closed 6 years ago
4