chncyhn / flappybird-qlearning-bot

Flappy Bird Bot using Reinforcement Learning
MIT License
416 stars 94 forks source link

Brillant #1

Closed TeCoEd closed 4 years ago

TeCoEd commented 8 years ago

Excellent program thank you for sharing. Is there a way to reset the rewards? I would like to see the 'learning', my Flappy Bird scored 246 on its first go! Thanks

chncyhn commented 8 years ago

Hello Dan,

Thanks a lot for your kind words!

Yes, you can reset the Q-values by running the initialize_qvalues.py script. It resets all Q-values in the qvalues.json file to zero, so the learning would start from scratch.

After you start running, every 25 games, the updated Q-values will be dumped to this json file. You can change the frequency of this by changing the _DUMPINGN variable in bot.py.

Also if you want to speed things up a little, you can change the FPS global variable in flappy.py (which is normally set to 60 fps).

Cihan

TeCoEd commented 8 years ago

Hi Chan, Yes that worked very well, thank you.

TeCoEd commented 8 years ago

Any easy explanation how it measures the x and y pixel position, or is this the position from the pipe?

chncyhn commented 8 years ago

If you are talking about the xdif and ydif variables in the states - yes, they indicate the horizontal and vertical distance from the next pipe, respectively. They are calculated by subtracting the next pipe's left-top corner pixel positions from the bird's pixel positions.

bot.act(-playerx + myPipe['x'], - playery + myPipe['y'], playerVelY )

You can see them calculated here in this function's first 2 arguments.

TeCoEd commented 8 years ago

Hi Cihan, This make sense, so is the calculation of the next pipe position in real time or is this part of the Q array. I am beginning to understand it! Thanks

chncyhn commented 8 years ago

Hello Ted,

Yes, the calculation are in real time. The x- and y-distances are calculated in every tick of the game loop. After calculating these differences (and combining them with the current y-velocity of the bird) we get the current state of the bird, as (xdif, ydif, yvel).

Then we look-up the respective Q[s,a] values of this state from the Q-array as you said, and choose the action a with the higher Q-value. If value of 'jumping' is higher, the bird jumps; if not it does nothing. The bot makes this decision at every tick of the game loop.

Cihan