Solitaire Rewards Too High

google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

Apache License 2.0

4.28k stars 938 forks source link

Solitaire Rewards Too High #742

Closed nileshop22 closed 2 years ago

nileshop22 commented 3 years ago

Hi @lanctot , first of all this is amazing collaborative effort! Thanks for this library. I tried training DQN with Solitaire in Pytorch by tweaking the code here but the rewards are too high that my Q-values becomes NaN. Is there any way to resolve this? Is rescaling the rewards a good option? If yes, how should I rescale them.

Any help would be really appreciated! Thanks

lanctot commented 3 years ago

Hmm, looking at the solitaire max reward: https://github.com/deepmind/open_spiel/blob/dbfb14322c8c3ebc089310032a56bfaed0dc4c01/open_spiel/games/solitaire.cc#L1556

That is a higher range than a lot of the other games. I am still surprised that you get NaNs though.

Maybe the learning rate is too high? I would first try tuning down the learning rate of your DQN agent (lower, maybe try 0.001 or 0.0005) because if that works then you don't have to modify the game.

But otherwise yes, you can also always transform your rewards to [0,1] by doing (game.min_utility() + reward) / (game.max_utility() - game.min_utility()).

nileshop22 commented 3 years ago

Hi @lanctot, thanks for your response. I implemented both of your suggestions and started training. I was wondering if someone has trained Solitaire before and if yes, how much reward they got after training? Thanks.

lanctot commented 3 years ago

I have not run it myself nor anybody in our team.

I will tag the original author of the game: @tyjch. Maybe they will have a better answer for you.

nileshop22 commented 3 years ago

Hi @tyjch, can you please look into this. Also, if you did train then how did you modified your rewards so that the training becomes stable. I'm unable to get good results with DQN for rewards in [0,1].

holgersindbaek commented 2 years ago

@nileshop22 @tyjch I'd love to see an implementation example for solitaire as well!