Closed nileshop22 closed 2 years ago
Hmm, looking at the solitaire max reward: https://github.com/deepmind/open_spiel/blob/dbfb14322c8c3ebc089310032a56bfaed0dc4c01/open_spiel/games/solitaire.cc#L1556
That is a higher range than a lot of the other games. I am still surprised that you get NaNs though.
Maybe the learning rate is too high? I would first try tuning down the learning rate of your DQN agent (lower, maybe try 0.001 or 0.0005) because if that works then you don't have to modify the game.
But otherwise yes, you can also always transform your rewards to [0,1] by doing (game.min_utility() + reward) / (game.max_utility() - game.min_utility()).
Hi @lanctot, thanks for your response. I implemented both of your suggestions and started training. I was wondering if someone has trained Solitaire before and if yes, how much reward they got after training? Thanks.
I have not run it myself nor anybody in our team.
I will tag the original author of the game: @tyjch. Maybe they will have a better answer for you.
Hi @tyjch, can you please look into this. Also, if you did train then how did you modified your rewards so that the training becomes stable. I'm unable to get good results with DQN for rewards in [0,1].
@nileshop22 @tyjch I'd love to see an implementation example for solitaire as well!
Hi @lanctot , first of all this is amazing collaborative effort! Thanks for this library. I tried training DQN with Solitaire in Pytorch by tweaking the code here but the rewards are too high that my Q-values becomes NaN. Is there any way to resolve this? Is rescaling the rewards a good option? If yes, how should I rescale them.
Any help would be really appreciated! Thanks