etlundquist / rankfm

Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data
GNU General Public License v3.0
170 stars 36 forks source link

Question: User/Item Interaction Features #25

Closed allard-jeff closed 4 years ago

allard-jeff commented 4 years ago

This looks like a very promising library - congrats!

I am not familiar with the theory yet, but is it possible to include user/interaction features? For example, a typical use case is the amount of time elapsed since a product was last purchased.

etlundquist commented 4 years ago

Thanks! Right now it's possible to include user-level auxiliary features, item-level auxiliary features, but not user-item (interaction-level) auxiliary features. So you could include a feature at the user level which denotes the time since last purchase of any item for each user, but not time since last purchase for each distinct user-item combination.

The underlying theory/math doesn't preclude doing the latter, but the data representation and internal math just gets very messy and more complex. Might be something for the next major release if the package gets enough traction!

I wrote this Medium article to explain the motivation and theory, it might help you understand what's going on under the hood if you're interested: https://towardsdatascience.com/factorization-machines-for-item-recommendation-with-implicit-feedback-data-5655a7c749db

allard-jeff commented 4 years ago

I need to read more about the theory. I am def aware of FM generally. I am curious, a couple of things if you don't mind:

1) Is the full model exposed - all the coeff and bias, along with the interaction factors that we can see / use? 2) When doing item / item similarity for example, is it just elements from the interaction factorization image that is being used (the bias and individual coeff are all ignored?

I am seeing some really great results using just user and item ids relative to xlearn or a deep neural network using embedding vectors and several dense layers!

etlundquist commented 4 years ago
  1. you can access all feature weights and latent factors as instance attributes. If you look at the code for the _reset_state() method you'll see all the internal model data that gets stored when the model is fitted.
  2. the item-item similarity is just using items' latent factor space representations. The scalar weights don't enter into the similarity calculations (think of those as more like each item's unpersonalized popularity or average utility adjustment). You can see the equations pretty naturally here: https://github.com/etlundquist/rankfm/blob/master/rankfm/rankfm.py#L423-L424. In general, the variable names I use in code are going to match all the notation in Medium, so it should be pretty clear what's going on under the hood once you've got the math down.

I'm glad to hear it's working well for you! Tell your friends! I benchmarked performance against Implicit, LightFM, and SparkML (which were the main packages I'd used for this kind of thing in the past) on both Instacart and Movielens RankFM was equal or superior to all of them for both data sets. xLearn is great for explicit feedback, but it's not optimized for implicit feedback problems, and using FMs without an implicit feedback training algorithm tends to do very poorly for item recommendation. One of the main reasons I wrote this package is that I couldn't find any existing libraries that combined generic FM models with Learning-to-Rank optimization. Unless you're dealing with text/image data, I haven't found NNs particularly effective or worth the trouble for item recommendation myself (see this too: https://arxiv.org/pdf/1907.06902.pdf). Good luck!

allard-jeff commented 4 years ago

Really helpful!

I dont see the bias or the weights for the users though. Are those missing?

The only issue I have seen so far is that if I use more than 1 user and item feature, there is an assertion error in regards to non finite weights. Taking logs, normalizing etc doesnt seem to help.

etlundquist commented 4 years ago

Yeah good question. So as a corollary of using pairwise loss functions [f(u, i) - f(u, j)] to train model weights, there actually is no need for scalar user biases. The partial derivative of the loss function wrt to these terms would be zero, and they aren't necessary for prediction, since the task at hand is all about finding the best items for each user (so any sort of global user bias that shifts user-item utility scores up/down wouldn't be applicable), not the other way around.

Hmm, that's interesting. I did run into exploding weights/gradients when using side features in my testing, but the problem seemed to be fixed by normalizing the auxiliary features onto [0, 1] and using a pretty strong auxiliary feature regularization factor (beta). Maybe that would be worth a try if you haven't explicitly attempted yet? I may try to introduce gradient clipping for numeric stability in a later version....

allard-jeff commented 4 years ago

Interesting, I wondered about that, but then saw that LightFM returned them, even with no auxiliary features. I tested using fit_partial and the biases are being updated during sgd.

image

image

I standardized, did not normalize to [0,1]. Doing so alleviated the issue. Thanks!