criteo-research / reco-gym

Code for reco-gym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
Apache License 2.0
465 stars 98 forks source link

The transition matrix of states seems wrong when we set "prob_leave_bandit" #29

Open yxliu-ntu opened 4 years ago

yxliu-ntu commented 4 years ago

https://github.com/criteo-research/reco-gym/blob/f8553d197f42ec2f415aefce48525d0e9b10ddaa/recogym/envs/reco_env_v1.py#L55-L57

Should the line 56th of the file "reco_env_v1.py" be [self.config.prob_bandit_to_organic, 0, self.config.prob_leave_bandit] ?

ihtiihti commented 4 years ago

Yes, please, fix it.