Closed blurLake closed 2 years ago
A follow-up to show that the environment is deterministic and reproducible. I set up the seed in a python script called SAC_Tuning_test_zoo.py as the following
# disable GPU
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
seed_value= 0
# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
os.environ['PYTHONHASHSEED']=str(seed_value)
# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
# 3. Set `numpy` pseudo-random generator at a fixed value
np.random.seed(seed_value)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
tf.random.set_random_seed(seed_value)
# 5. Configure a new global `tensorflow` session
from tensorflow.keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
and then build the model with seed=0
model = SAC(CustomSACPolicy, env, gamma = 0.1, tau = 0.005, learning_rate=0.000533201801295971, buffer_size=10000, action_noise=None, verbose=1, batch_size = 512,tensorboard_log="./zoo_repro_tensorboard/",\
ent_coef=0.05, train_freq=2, random_exploration=0.0, seed=0, learning_starts=1)
Here I run two separate training for 20 timesteps with seed =0, and print out the reward from each step. Two training has exact the same reward as can be seen from the attached picture.
Not sure if this is a SB or optuna issue, but would be great to have some suggestions from you. Thank you very much!
The solution is to add seed in sac hyperparams candidate list explicitly since the default one is none. Now the different trials have the same value (since there is only one hyperparameter combination).
Hi,
I am trying to have reproducible and deterministic results from zoo hyperparameter optimization. Right now I only have one set of hyperparameters as a candidate (attached below).
I set the seed at the beginning of train.py as the following
I also modified set_global_seed() in misc_util.py as follows (adding python seed and disabling GPU, just in case)
There are some duplications, but it should be fine if the seed=0. Then I run train.py with seed = 0 as the following
I expect since the hyperparameters are the same, different trials should give the same reward. But it is not the case. See the picture attached. Even though the hyperparameters are the same, the results from trial 0 and 1 are different.