Some questions about "train_baseline.py"

eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas

MIT License

380 stars 134 forks source link

Open Aaricis opened 3 years ago

Aaricis commented 3 years ago

I run train_baseline.py, and after some iterations, I got information like this: 屏幕截图 2020-09-21 155836

The policy_reward_mean always equals 0. I do not know whether this result is correct.