Closed reobroqn closed 3 years ago
Hello, I am terribly sorry for having forgotten to answer this issue... Do you still need help?
Yes, please! Many thanks!
Well, its simple. Just obtain (say) 10, or how many item embeddings your actor needs, then simply combine the items with your user's ratings. I concatenate them at the end: 128 (embedding size) 10 + 10 1 (rating size). Its like [item1, item2, ..., item10, rating1, rating2, .., rating10], where is like unpack from python. Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination. Then find (say) top 5 closest movies based on euclidean / cosine distance
Here is a kitty pic for long wait as a reward for you :)
Ok. I got it. Thanks for the pic!
Well, its simple. Just obtain (say) 10, or how many item embeddings your actor needs, then simply combine the items with your user's ratings. I concatenate them at the end: 128 (embedding size) 10 + 10 1 (rating size). Its like [item1, item2, ..., item10, rating1, rating2, .., rating10], where is like unpack from python. Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination. Then find (say) top 5 closest movies based on euclidean / cosine distance
Here is a kitty pic for long wait as a reward for you :)
What do you mean by
'Feed them to the actor, it produces "the ideal movie" for this specific item / rating combination'
My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select.
Also, what does this means 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies
My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select. In q-learning with descrete action, this is true. However in continuous action setting, actor produced an action with maximum reward. 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies
Take this action with maxumum reward. Apply some distance function, find 5 closest actions
My assumption here is that the policy net (actor) outputs a prob distribution over all the actions a user can select. In q-learning with descrete action, this is true. However in continuous action setting, actor produced an action with maximum reward. 'Then find (say) top 5 closest movies based on euclidean / cosine distance' since the output of the policy net is a prob distribution over all the actions/movies
Take this action with maxumum reward. Apply some distance function, find 5 closest actions
So in the discrete setting would you just take the top k outputs with the highest probability instead of using distance measures? By the way, I'm using the reinforce algo
Yes, just take the highest probability
And one more question, does it make sense to 'discount' the recommended actions with the beta network?
What do you mean by discounting?
On Fri, Oct 22, 2021, 8:19 PM moses-bm @.***> wrote:
And one more question, does it make sense to 'discount' the recommended actions with the beta network?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/awarebayes/RecNN/issues/23#issuecomment-949823332, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKGNORBSFCYJZVLFSYMYZ3DUIGMJNANCNFSM43YXY4AA .
As the title! I have gone through the docs, but don't know how to do that, or it can even be executed ? I have just jumped into reinforcement learning based recommendations since yesterday, so it can be a silly question. I thought it would work the same as other recommender system algorithms, but I got confused in Recommending part with Actor and Critic. Thanks!