Zhendong-Wang / Diffusion-Policies-for-Offline-RL

Apache License 2.0
219 stars 33 forks source link

Bad performence on pen environment #8

Open cccedric opened 1 year ago

cccedric commented 1 year ago

Hi Zhendong,

I use the DQL code and the hyperparameters the code provided to test the algorithm on pen-cloned-v1, but the results I got is far away from what the paper said. The average score can only reach about 28. And the critic loss will be exploded to a crazy value (about 1e10).

The evaluation result is shown below: Screenshot from 2023-06-22 11-30-49

The target Q mean result is shown below: Screenshot from 2023-06-22 11-31-30

The critic loss result is shown below: Screenshot from 2023-06-22 11-32-23

And then I test it on pen-human-v1 but got a similar bad result.

Have you met the same issue and how to solve it?

Thanks!

Zhendong-Wang commented 1 year ago

I am not sure which model selection method you are using. If you are using offline, the model training should stop before the critic values go crazy. For online setting, I remembered that I met critic value exploding sometimes for Adroit tasks, but it won't affect the highest performance, where the final model is selected. Or you could choose to strenghten the policy regularization part to avoid the critic exploding.

I rerun some experiments, and on my machine the performance matched.

HenryZhang-git commented 12 months ago

Hi,bro.Could you tell me how to visualize data?I have trained agent and had a file which named debug.log.Your reply is so important to me.Looking forward to your reply

cccedric commented 12 months ago

I use tenserboard to visualize the data :)

HenryZhang-git commented 12 months ago

I use tenserboard to visualize the data :)

Thanks for your reply!I also want to know, did you run this project on Ubuntu and use Tensorboard for visualization?Thank you very much!

cccedric commented 12 months ago

Yes, I run the project on ubuntu20.04.

HenryZhang-git commented 12 months ago

Yes, I run the project on ubuntu20.04.

Your reply is very helpful to me!Thank you!Looking forward to our next communication!