awarebayes / RecNN

Reinforced Recommendation toolkit built around pytorch 1.7
Apache License 2.0
577 stars 114 forks source link

confused about train #4

Closed davidjiangt closed 5 years ago

davidjiangt commented 5 years ago

In the existing literature, they use agents to explore and then get a model based on the interaction data. But I found that in your code, you use the data directly from the Dataset to train.I was wondering if I didn't understand your code or...

awarebayes commented 5 years ago

Maybe you are confusing it with on-policy? How do I explore off policy? What do you mean by exploring? There is a BCQ section that focuses on RL without exploration on a static dataset using VAEs and sampled actions. Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen et al, https://arxiv.org/abs/1812.02353) proposes a correction algorithm, but it works only on top of existing recommendations provided by other models (BCQ, DDPG, TD3, SAC):

Screenshot

Unlike classical reinforcement learning, our learner does not have real-time interactive control of the recommender due to learning and infrastructure constraints. In other words, we cannot perform online updates to the policy and generate trajectories according to the updated policy immediately. Instead we receive logged feedback of actions chosen by a historical policy (or a mixture of policies), which could have a different distribution over the action space than the policy we are updating

I primarily focus on Batch Constrained Learning: https://arxiv.org/abs/1812.02900

awarebayes commented 5 years ago

Any update? Do I need to close this issue? I also have noticed you forked the repo a month ago. I would suggest you git pull it. Pretty much everything has changed

davidjiangt commented 5 years ago

Thank you very much for your reply!

davidjiangt commented 5 years ago

YES, you can close this issue