issues
search
Dragon-Zhuang
/
BPPO
Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).
MIT License
69
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Why it is necessary to use a ValueLearner?
#6
xuruiyang
opened
4 months ago
0
how to infer the trained model?
#5
shandongchong
opened
5 months ago
0
offline to online
#4
shandongchong
opened
6 months ago
0
OFFLINE
#3
shandongchong
closed
6 months ago
0
loss does not converge
#2
daihuiao
opened
1 year ago
2
BPPO seems to involve online evaluation when doing offline training
#1
typoverflow
closed
1 year ago
8