For Demo 5: Boat Race Example.ipynb, it might be illustrative to add a purely random agent to compare policies.
Also, by implementing the random agent, we might see an action bias in the policy defined inside if, for instance, we don't use random tie breaking between argmax actions.
It is not immediately clear to me how the action 0 or 1 is being selected given the action calculation line:
For
Demo 5: Boat Race Example.ipynb
, it might be illustrative to add a purely random agent to compare policies.Also, by implementing the random agent, we might see an action bias in the policy defined inside if, for instance, we don't use random tie breaking between argmax actions.
It is not immediately clear to me how the action 0 or 1 is being selected given the action calculation line: