Yongw-Z / SARSAvsQlearning

Comparison of SARSA and Q-learning algorithms in reinforcement learning model-free based on maze modeling. 迷路モデルに基づく強化学習モデルフリーにおけるSARSAとQ学習アルゴリズムの比較。 基于迷宫模型比较了强化学习model-free中的SARSA和Q学习算法。
GNU General Public License v3.0
0 stars 0 forks source link

Summarize #1

Open Yongw-Z opened 1 year ago

Yongw-Z commented 1 year ago

As can be seen from the paths shown in the .ipynb file, in the case of SARSA the agent is learning a path to the goal by taking a detour and avoiding traps as much as possible.

Recalling the update formula of the SARSA algorithm, in the case of SARSA the agent acquires a safe strategy because the actual actions it takes affect the value update.

Yongw-Z commented 1 year ago

On the other hand, Q-learning learns the shortest path by updating the value with the best action, despite the high risk of falling into a trap.

Yongw-Z commented 1 year ago

In conclusion, the convergence of the value function is slower with SARSA than with Q-learning due to the effect of randomness resulting from 𝜖-greedy method, but you should consider using SARSA for situations where the cost of trial-and-error is high.