Code for reco-gym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
465
stars
98
forks
source link
The transition matrix of states seems wrong when we set "prob_leave_bandit" #29
Open
yxliu-ntu opened 4 years ago
https://github.com/criteo-research/reco-gym/blob/f8553d197f42ec2f415aefce48525d0e9b10ddaa/recogym/envs/reco_env_v1.py#L55-L57
Should the line 56th of the file "reco_env_v1.py" be
[self.config.prob_bandit_to_organic, 0, self.config.prob_leave_bandit]
?