The issue with runing off policy algorithm

xiaosly commented 10 months ago

The simulation with customized environment can not be run with off policy algorithm. It always stuck with warmup stage. While for the on policy algorithm, it is fine.

guazimao commented 10 months ago

Hi. Could you provide more detailed information like error messages?

xiaosly commented 10 months ago

The off policy require a long time for the warmup step. I am wondering if this this step is necessary since my env is not that complex to go through such long time warmup. Besides that, after the warmup step, the reward is wired and it converged at the initial stage of training.

------------------ 原始邮件 ------------------ 发件人: "PKU-MARL/HARL" @.>; 发送时间: 2024年1月23日(星期二) 晚上8:31 @.>; 抄送: "xiao @.**@.>; 主题: Re: [PKU-MARL/HARL] The issue with runing off policy algorithm (Issue #30)

Hi. Could you provide more detailed information like error messages?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

guazimao commented 10 months ago

Sorry for the late reply. In the warmup step, only random action sampling takes place. It is not necessary and can be skipped if it takes too long. Unfortunately, without detailed information about your customized environment, we cannot explain why the reward is weird.

PKU-MARL / HARL

The issue with runing off policy algorithm #30