awarebayes / RecNN

Reinforced Recommendation toolkit built around pytorch 1.7
Apache License 2.0
574 stars 113 forks source link

How to make recommendation for a specific user ? #23

Closed reobroqn closed 3 years ago

reobroqn commented 3 years ago

As the title! I have gone through the docs, but don't know how to do that, or it can even be executed ? I have just jumped into reinforcement learning based recommendations since yesterday, so it can be a silly question. I thought it would work the same as other recommender system algorithms, but I got confused in Recommending part with Actor and Critic. Thanks!

awarebayes commented 3 years ago

Hello, I am terribly sorry for having forgotten to answer this issue... Do you still need help?

reobroqn commented 3 years ago

Yes, please! Many thanks!

awarebayes commented 3 years ago

Well, its simple. Just obtain (say) 10, or how many item embeddings your actor needs, then simply combine the items with your user's ratings. I concatenate them at the end: 128 (embedding size) 10 + 10 1 (rating size). Its like [item1, item2, ..., item10, rating1, rating2, .., rating10], where is like unpack from python. Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination. Then find (say) top 5 closest movies based on euclidean / cosine distance

Here is a kitty pic for long wait as a reward for you :) photo_2021-05-20_16-03-45

reobroqn commented 3 years ago

Ok. I got it. Thanks for the pic!

moses-bm commented 2 years ago

Well, its simple. Just obtain (say) 10, or how many item embeddings your actor needs, then simply combine the items with your user's ratings. I concatenate them at the end: 128 (embedding size) 10 + 10 1 (rating size). Its like [item1, item2, ..., item10, rating1, rating2, .., rating10], where is like unpack from python. Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination. Then find (say) top 5 closest movies based on euclidean / cosine distance

Here is a kitty pic for long wait as a reward for you :) photo_2021-05-20_16-03-45

What do you mean by

'Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination'

My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select.

Also, what does this means 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies

awarebayes commented 2 years ago

My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select. In q-learning with descrete action, this is true. However in continuous action setting, actor produced an action with maximum reward. 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies

Take this action with maxumum reward. Apply some distance function, find 5 closest actions

moses-bm commented 2 years ago

My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select. In q-learning with descrete action, this is true. However in continuous action setting, actor produced an action with maximum reward. 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies

Take this action with maxumum reward. Apply some distance function, find 5 closest actions

So in the discrete setting would you just take the top k outputs with the highest probability instead of using distance measures? By the way, I'm using the reinforce algo

awarebayes commented 2 years ago

Yes, just take the highest probability

moses-bm commented 2 years ago

And one more question, does it make sense to 'discount' the recommended actions with the beta network?

awarebayes commented 2 years ago

What do you mean by discounting?

On Fri, Oct 22, 2021, 8:19 PM moses-bm @.***> wrote:

And one more question, does it make sense to 'discount' the recommended actions with the beta network?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/awarebayes/RecNN/issues/23#issuecomment-949823332, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKGNORBSFCYJZVLFSYMYZ3DUIGMJNANCNFSM43YXY4AA .