Dragon-Zhuang / BPPO

Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).
MIT License
69 stars 5 forks source link

loss does not converge #2

Open daihuiao opened 1 year ago

daihuiao commented 1 year ago

image

Is this normal, or the environment is not installed properly?

Lei-Kun commented 1 year ago

Hi, why the Q loss cannot covergence is that we just normalize the state and did not normalize the next state in the buffer.py file. We've updated the source code to fix this fault. Thank you for your careful check and valuable comments.

donkehuang commented 11 months ago

@Lei-Kun ,hi, since the next state is normalized ,the loss and score don't show the effect as we expect lossandcore