PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.81k stars 1.28k forks source link

In q-learning chapter 6: why sample_env() only once in an iteration? #60

Open BAXMAY opened 4 years ago

BAXMAY commented 4 years ago

In 01_frozenlake_q_learning.py, every iteration you call sample_env() only one time.

while True:
    iter_no += 1
    s, a, r, next_s = agent.sample_env()
    agent.value_update(s, a, r, next_s)
    reward = 0.0
    ...

I think this can be improve by calling sample_env() for many time in one iteration. like this:

while True:
    iter_no += 1
    for _ in range(1000):
        s, a, r, next_s = agent.sample_env()
        agent.value_update(s, a, r, next_s)
    reward = 0.0
    ...

It can solve in much less iterations.

Is this valid or I misunderstand the concept of q-learning?