Closed xiaosly closed 9 months ago
Hi. Could you provide more detailed information like error messages?
The off policy require a long time for the warmup step. I am wondering if this this step is necessary since my env is not that complex to go through such long time warmup. Besides that, after the warmup step, the reward is wired and it converged at the initial stage of training.
------------------ 原始邮件 ------------------ 发件人: "PKU-MARL/HARL" @.>; 发送时间: 2024年1月23日(星期二) 晚上8:31 @.>; 抄送: "xiao @.**@.>; 主题: Re: [PKU-MARL/HARL] The issue with runing off policy algorithm (Issue #30)
Hi. Could you provide more detailed information like error messages?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Sorry for the late reply. In the warmup step, only random action sampling takes place. It is not necessary and can be skipped if it takes too long. Unfortunately, without detailed information about your customized environment, we cannot explain why the reward is weird.
The simulation with customized environment can not be run with off policy algorithm. It always stuck with warmup stage. While for the on policy algorithm, it is fine.