Open jensen-Shen-Li opened 10 months ago
This might be due to the issue that random seeds exhibit different behaviors on various devices. Given that diffusion models fundamentally rely on random sampling, the impact of random seeds on them is relatively significant. Additionally, in the context of sparse rewards in AntMaze, if the model is overfitted, it results in an output score of 0. The phenomenon you described also occurs in the training process of Diffusion QL [ICLR 2023] in antmaze-large. You can freely adjust the target_kl and lambda_min to avoid this phenomenon in the umaze task. In my code, the best results will be recorded.
On Fri, Dec 15, 2023 at 8:41 PM Longxiang he @.***> wrote:
This might be due to the issue that random seeds exhibit different behaviors on various devices. Given that diffusion models fundamentally rely on random sampling, the impact of random seeds on them is relatively significant. Additionally, in the context of sparse rewards in AntMaze, if the model is overfitted, it results in an output score of 0. The phenomenon you described also occurs in the training process of Diffusion QL [ICLR 2023] in antmaze-large. You can freely adjust the target_kl and lambda_min to avoid this phenomenon in the umaze task. In my code, the best results will be recorded.
— Reply to this email directly, view it on GitHub https://github.com/felix-thu/DiffCPS/issues/1#issuecomment-1857823813, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDD6PJ7XF66E5PYRNHOYA6TYJRARDAVCNFSM6AAAAABAWEREO2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJXHAZDGOBRGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Oh, thank you and thanks for your reply. I also made a comparison with Diffusion QL[ICLR2023], and like you said, the overfitting phenomenon happened again. In both DiffCPS and Diffusion QL, I utilized the best parameters recoded in the codes, but because seeds was not given, I can't recurrence the best score reported in paper. I would like to ask, what are the skills of seeds selection, or how to adjust the seeds to achieve the best results in paper.
I git clone and run "python run --env_name antmaze-umaze-v0 --device 0 --lr_decay" without any modification, but the results were good at first, after 550000 epochs, the results became 0 until end. There are some parameters need to modify? Or the result can be recoded with best score?