Dragon-Zhuang BPPO issues - Githubissues

Dragon-Zhuang / BPPO

Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).

MIT License

69 stars 5 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Why it is necessary to use a ValueLearner?

#6 xuruiyang opened 4 months ago
0
how to infer the trained model?

#5 shandongchong opened 5 months ago
0
offline to online

#4 shandongchong opened 6 months ago
0
OFFLINE

#3 shandongchong closed 6 months ago
0
loss does not converge

#2 daihuiao opened 1 year ago
2
BPPO seems to involve online evaluation when doing offline training

#1 typoverflow closed 1 year ago
8