andland / implicitcf

Other
4 stars 3 forks source link

Explanation of Implicit MF with logistic loss #6

Closed nickresnick closed 7 years ago

nickresnick commented 7 years ago

Hey Andrew -- not an issue, but I came across your response on this cross validated thread: http://stats.stackexchange.com/questions/40447/collaborative-filtering-and-implicit-ratings-normalization/205326#205326

I've been studying the MF techniques, specifically the one with a logistic loss. I see you refer to the likelihood used in the paper as a "negative weighted Bernoulli log likelihood". Looking at eq (2) in the paper: http://stanford.edu/~rezab/nips2014workshop/submits/logmat.pdf, isn't he missing an exponent on the (1-p) term, as is needed in a Bernoulli likelihood?

The way it's implemented now, positive outcomes are still multiplied by (1 - p), which shouldn't be the case, right?

Thanks!

andland commented 7 years ago

You are right. When I originally read the paper, I just assumed they were weighting log likelihood. It is actually a little weirder. Equation (2) follows through to eq (3) and the gradients, so it is not just a typo for eq (2). What their objective is saying is that negative elements count as a single negative observation, but positive elements count at $\alpha r_{ui}$ positive observations and 1 negative observation.

I guess this is a way of regularizing, although I'm not sure it is what they were intending. In fact, they state "let each nonzero element $r{ui} \neq 0$ serve as c positive observations and each zero element $r{ui} = 0$ serve as a single negative observation."

Thanks for pointing this out!

nickresnick commented 7 years ago

cool, yeah that's what I thought. Thanks for the response, I thought i was crazy.

So if we change it to (1 - p_ui) ^ (1 - l_ui) where l_ui is the "binarized" R matrix, this should be correct, right?

Btw, I can see you've done a lot of work with recommender systems. Any cool collaborative filtering techniques catch your eye lately?

andland commented 7 years ago

Yeah that would correct it.

I have been thinking a bit about word2vec and how it relates to matrix factorization techniques for collaborative filtering. I think the negative sampling version of word2vec can be interpreted as a stochastic version of logistic MF.

nickresnick commented 7 years ago

Cool, thanks for sharing. I've been hearing about word2vec all over, definitely something I'll have to look in to.

One last q before I close this: Is it correct to think about weighing a likelihood as a data augmentation technique as opposed to changing the underlying probability distribution? I.e. in L(params | data), it's the "data" we're changing, not the underlying likelihood?

andland commented 7 years ago

Giving an observation a weight of w (where w is a positive integer) is equivalent to having that observation be repeated w times. So in that sense, yes.