Closed testerpce closed 5 years ago
Hello, could you share the link to your google colab notebook?
Please update stable-baselines, SAC was introduced only in v 2.4.0 (you have v2.2.1)
Doing:
!pip install stable-baselines --upgrade
will solve your issue.
Please fill in the issue template completely next time ;) (notably version of SB)
PS: I'll update the colab notebook, I assumed it would download the latest version of SB automatically, which is apparently not the case.
Ok I will write the version of stable baselines next time. Sorry I did not notice that. Anyways I am training soft-actor-critic on HumanoidBulletEnv-vo and it seems that after a while the rewards go down and right now they are spiralling towards negative values. Are the hyperparameters for Softactor critic for Humanoid Bullet env -v0 not tuned? Because I saw on the scores that SAC reaches upto 2048 rewards on 149000 steps on HumanoidBulletEnv-v0. Are the hyperpaprameters maybe not tuned on the updated version of stable-baseline?
Because I saw on the scores that SAC reaches upto 2048 rewards on 149000 steps on HumanoidBulletEnv-v0.
The reported performance is after full training (2e7 steps in that case). Hyperparameters are not tuned yet, but should give you ok results given you train it fully.
Anyways I am training soft-actor-critic on HumanoidBulletEnv-vo and it seems that after a while the rewards go down and right now they are spiralling towards negative values.
I'm missing quite a lot of information here. What command did you use, how long did you wait, how many random seeds did you try?
!python train.py --algo sac --env HumanoidBulletEnv-v0
This is the command I used so it seems that even the timesteps for the hyperparameters would be taken from the hyperparameters file and I reported this at the end of some 18000000 steps. Now at the end of the whole training, it is giving negative values. I ran the program twice but I did not change anything. I just ran the above code twice. Should I try changing the random seeds manually by parsing?
I am running the rl-baselines-zoo for humanoid bullet in google colab. At first I ran it with ppo2 and it gave a very good result with rewards going upto 1600. Now I am running the Softactor critic and it is giving the following error.
!python train.py --algo sac --env HumanoidBulletEnv-v0 --n-timesteps 10000000