Closed JankowskiChristopher closed 7 months ago
Currently I am also running Humanoid-v4 and the behavior is really bizarre too. After 200k steps the agent is not learning, entropy coefficient raised to 20000 (and still getting bigger), critic loss to 8e+6.
HalfCheetah-v4 seed 2: https://api.wandb.ai/links/krzysztofj/h2rte93m
@adityab thanks for updating the README, I created a fresh conda env and ran the code with the provided instruction. Unfortunately the issue I faced still persists. After around 100k steps the rewards are ~8k and suddenly collapse to 0 (even negative). Tested it with 3 seeds as thought that maybe seeds cause instability, but the training is fairly similar between seeds.
I am running the code on a cluster with Titan V GPUs, I may try a different cluster with A100 GPUs, but not sure if the differences in GPUs are the problem here. Could you please check that the code in the camera-ready version is the same that you used in your experiments? Maybe something accidentally got switched (happens quite often).
Current wandb logs (HalfCheetah-v4, 400k steps).
We accidentally forgot to specify bn_momentum
in the crossq
hyperparam group in train.py
. I just fixed this in master, training should work as expected now.
HalfCheetah-v4
seed 2
: https://api.wandb.ai/links/ias/40wrftpn
Thanks a lot for your report, @JankowskiChristopher!
Hello, Could you please provide more detailed instructions in README to reproduce your results? I ran your code (had to slightly change requirements due to conflicts, see: https://github.com/adityab/CrossQ/issues/2), but I cannot reproduce your paper's results. I am running HalfCheetah-v4 environment and the agent is training perfectly up to ~100k steps, at this moment the average rewards are around 8k, but suddenly later drop to almost 0 (even negative values). I even ran this on 4 seeds, but this behavior still persists.
The critic's loss raises to ~1e6, q values have large negative values ~ -5000, entropy coefficient starts to rise. Could you please check with a fresh conda environment that everything in the code is correct and provide a more detailed step by step instructions how to run the agent (current README does not work)? If you find it helpful I can share my wandb logs.