Dragon-Zhuang / BPPO

Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).
MIT License
69 stars 5 forks source link

Why it is necessary to use a ValueLearner? #6

Open xuruiyang opened 4 months ago

xuruiyang commented 4 months ago

I just have a question regarding the necessity of ValueLearner. Given that we are training on the same offline dataset, why don't we just pick the Return directly from the dataset when computing the advantage? Why would it be beneficial to train another value model to predict the Return value?