benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Validating partial_fit_users and partial_fit_items vs model.fit #650

Open Selva163 opened 1 year ago

Selva163 commented 1 year ago

Trying to use ALS for real-time recommendations. however seeing differences in recommended items between the model trained on full data vs trained on new incremental data. Steps followed,

  1. Train the model on entire data (last 30 days of user-item interactions) and save the model as pickle file.
  2. For real-time, load the saved(pickle) model, get the latest interactions in the last 5 mins, use partial_fit_users and partial_fit_items method to incrementally train on the new data.
  3. Get the recommendations for the latest active users.

However, for validation, I again trained the model with (last 30 days + last 5 mins) and got the recommendations.

The results from incremental training and retraining are quite different.

How can we get the same recommendations from both model? Idea is to save few mins of entire retraining. Not able to see any (real-life) examples of partial_fit_users or partial_fit_items except on doc and als_test.py

Selva163 commented 1 year ago

@benfred @Focus could you please provide any suggestion on this

benfred commented 1 year ago

For real-time, load the saved(pickle) model, get the latest interactions in the last 5 mins, use partial_fit_users and partial_fit_items method to incrementally train on the new data.

For the partial_fit_items/users - are you just including the results from the last 5 minutes? If so - I don't think that will work. The partial fit functions allow you to only calculate factors for a subset of users/items - but for the users being updated you will need their full history, rather than just the most recently interacted items

bahag-rehmanr1 commented 1 year ago

What about new users? Will they be accomodated too?