Open BAXMAY opened 4 years ago
In 01_frozenlake_q_learning.py, every iteration you call sample_env() only one time.
01_frozenlake_q_learning.py
while True: iter_no += 1 s, a, r, next_s = agent.sample_env() agent.value_update(s, a, r, next_s) reward = 0.0 ...
I think this can be improve by calling sample_env() for many time in one iteration. like this:
while True: iter_no += 1 for _ in range(1000): s, a, r, next_s = agent.sample_env() agent.value_update(s, a, r, next_s) reward = 0.0 ...
It can solve in much less iterations.
Is this valid or I misunderstand the concept of q-learning?
In
01_frozenlake_q_learning.py
, every iteration you call sample_env() only one time.I think this can be improve by calling sample_env() for many time in one iteration. like this:
It can solve in much less iterations.
Is this valid or I misunderstand the concept of q-learning?