Open Aaricis opened 3 years ago
I run train_baseline.py, and after some iterations, I got information like this:
train_baseline.py
The policy_reward_mean always equals 0. I do not know whether this result is correct.
policy_reward_mean
0
I run
train_baseline.py
, and after some iterations, I got information like this:The
policy_reward_mean
always equals0
. I do not know whether this result is correct.