Optimal policy chooses "zero" actions, see: Gambler's problem, file.

Almujtaba-Yaseen / Learning-Reinforcement-Learning

Tracking my RL learning journey...

0 stars 0 forks source link

Open Almujtaba-Yaseen opened 2 years ago

Almujtaba-Yaseen commented 2 years ago

Why does the optimal policy in my solution to the Gambler's problem chooses zero actions in some states?

Choosing a zero stake given a capital, has no affect in this capital, which means we will be stuck in this capital and never reaches our goal.

Does this make any sense? try to interpret those actions.

Or there's an error in the code?