chncyhn / flappybird-qlearning-bot

Flappy Bird Bot using Reinforcement Learning
MIT License
416 stars 94 forks source link

Any suggestions on how to improve the algo? #9

Closed Abhishek150598 closed 4 years ago

Abhishek150598 commented 5 years ago

Hii, I was doing a similar project and I took help from your code for making the bot. However I am not seeing any significant learning growth ;( High score is 29(1 per pipe) after 7000 iterations. Any suggestions on how to improve the algo? You created a high death rate flag(I suppose it was due to higher probability of bird hitting the top pipe), would you suggest creating a similar low death flag? And how does it effect the learning if you increase the penalty to say -4000?

chncyhn commented 5 years ago

Hello @Abhishek150598!

I think discretizing/gridding the x- and y-coordinates was quite helpful for me in converging faster. See this note in the readme:

I defined the states a little different from sarvagyavaish. In his version horizontal and vertical distances from the next pipe define the state of the bird. When I wrote the program to work like this, I found that convergence takes a very long time. So I instead discretized the distances to 10x10 grids, which greatly reduces the state space. Moreover, I added vertical velocity of the bird to the state space.

I ended up settling at 5x5 grids, but in general, you might want to try using this idea to reduce the size of your state space.

Indeed I added a high death flag, and this is because dying to the top of a pipe is most often the result of a bad jump. Without flagging this, it will take many iterations for the information that some jumps are the cause these deaths. This is because right before dying, the state-action pair will probably look like (somewhere high, don't jump). But the actual cause happened in a previous pair which looked like (where jump occured, jump). With the flag, I am able to immediately punish this state-action pair.

I don't think a low death flag makes sense, as it's naturally handled without flagging.

Honestly I don't remember how hyperparameters like the reward function and learning rate affected convergence. But indeed it could improve your algorithm to play with these settings.

Abhishek150598 commented 5 years ago

Thanks for the reply..I have been trying out a few variations, at last I found a good algorithm. The performance has significantly improved in the last 1000 iterations!