chncyhn / flappybird-qlearning-bot

Flappy Bird Bot using Reinforcement Learning
MIT License
416 stars 94 forks source link

Tendency to hit upper Pipe #3

Closed yashkotadia closed 6 years ago

yashkotadia commented 6 years ago

Hi! I am trying to build my own version of Flappy Bird bot. I've noticed, most of the times the bird dies because of flapping too much and ultimately hitting the upper pipe. Do you have an explanation as to why this happens?

chncyhn commented 6 years ago

Hello @yashkotadia!

This will indeed happen if you don't take a precaution. Check this section out from the README of this project:

In addition if the bird dies by collapsing to the top-section of a pipe, the state where bird jumped gets flagged and is punished additionally. This works nice, since dying to the top-section of the pipe is almost always the result of a bad jump. The flagging helps propagating the information to this ‘bad’ [s,a] pair quickly.

So I punish the jumping action at the state which caused collapsing on a top pipe. You might also want to implement a logic like this to prevent this problem. Otherwise the information punishes the states right before collapsing, while actually the mistake was done back when the 'jump' action was taken.

yashkotadia commented 6 years ago

Oh! I understood what you said. Makes sense. For now I'm seeing an average score of mere 2 points after a million iterations. I'll try integrating this bit and see how it works. Thank you!!