Bonus based Exploration - Githubissues

Bahador-Bakhshi / 5G-Federation

15 stars 0 forks source link

Bonus based Exploration #21

Open Bahador-Bakhshi opened 3 years ago

Bahador-Bakhshi commented 3 years ago

Q(s, a) ← (1- α) Q(s, a) + α [r + γ ⋅ max a' f(Q(s', a'),N(s', a'))]

In this equation:

N(s′, a′) counts the number of times the action a′ was chosen in state s′.
f(Q, N) is an exploration function, such as f(Q, N) = Q + κ/(1 + N), where κ is a curiosity hyperparameter that measures how much the agent is attracted to the unknown.