Cannot reproduce paper's results

adityab / CrossQ

Official code release for "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity"

http://aditya.bhatts.org/CrossQ

Other

57 stars 4 forks source link

Cannot reproduce paper's results #3

Closed JankowskiChristopher closed 7 months ago

JankowskiChristopher commented 7 months ago

Hello, Could you please provide more detailed instructions in README to reproduce your results? I ran your code (had to slightly change requirements due to conflicts, see: https://github.com/adityab/CrossQ/issues/2), but I cannot reproduce your paper's results. I am running HalfCheetah-v4 environment and the agent is training perfectly up to ~100k steps, at this moment the average rewards are around 8k, but suddenly later drop to almost 0 (even negative values). I even ran this on 4 seeds, but this behavior still persists.

The critic's loss raises to ~1e6, q values have large negative values ~ -5000, entropy coefficient starts to rise. Could you please check with a fresh conda environment that everything in the code is correct and provide a more detailed step by step instructions how to run the agent (current README does not work)? If you find it helpful I can share my wandb logs.

JankowskiChristopher commented 7 months ago

Currently I am also running Humanoid-v4 and the behavior is really bizarre too. After 200k steps the agent is not learning, entropy coefficient raised to 20000 (and still getting bigger), critic loss to 8e+6.

JankowskiChristopher commented 7 months ago

HalfCheetah-v4 seed 2: https://api.wandb.ai/links/krzysztofj/h2rte93m

JankowskiChristopher commented 7 months ago

@adityab thanks for updating the README, I created a fresh conda env and ran the code with the provided instruction. Unfortunately the issue I faced still persists. After around 100k steps the rewards are ~8k and suddenly collapse to 0 (even negative). Tested it with 3 seeds as thought that maybe seeds cause instability, but the training is fairly similar between seeds.

I am running the code on a cluster with Titan V GPUs, I may try a different cluster with A100 GPUs, but not sure if the differences in GPUs are the problem here. Could you please check that the code in the camera-ready version is the same that you used in your experiments? Maybe something accidentally got switched (happens quite often).

JankowskiChristopher commented 7 months ago

Current wandb logs (HalfCheetah-v4, 400k steps).

adityab commented 7 months ago

We accidentally forgot to specify bn_momentum in the crossq hyperparam group in train.py. I just fixed this in master, training should work as expected now.

HalfCheetah-v4 seed 2: https://api.wandb.ai/links/ias/40wrftpn

Thanks a lot for your report, @JankowskiChristopher!