chncyhn / flappybird-qlearning-bot

Flappy Bird Bot using Reinforcement Learning
MIT License
416 stars 94 forks source link

Does it guarantee to converge? #6

Closed w3ntao closed 4 years ago

w3ntao commented 6 years ago

I deploy your strategy to my program:

//update when death:

Q[lastS][lastA] = (1 - lrAlpha) * Q[lastS][lastA] + lrAlpha * deadReward;

for all (S, A) in reversed(expHistory):
    Q[S][A] = (1 - lrAlpha) * Q[S][A] + lrAlpha * (aliveReward + gamma * max(Q[S_]));
    S_ = S;

My bird hit highest score at about 500 round, then scores keep going down. At about 1000 round, it never make through more than 1 pipe and keep hitting ceiling or upper pipe.

Have you encounter such problem?

jkqlt

chncyhn commented 6 years ago

Hello @WentaoZero, I don't remember having this problem.

Does this happen consistently for all runs? Also have you tried using different hyperparameters (learning rate etc.) and discretization levels?

w3ntao commented 6 years ago

Hi @chncyhn , good to have you replied. That was a week ago, now I already have it fixed. This is my repo: https://github.com/WentaoZero/q-bird And your work is cited 😄

chncyhn commented 6 years ago

@WentaoZero That's great to hear! Very nice project. 👍