PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems
https://cornac.preferred.ai
Apache License 2.0
863 stars 141 forks source link

[Bug/Feature] Maybe add online/streaming updates to models? #452

Closed florin-stats closed 2 years ago

florin-stats commented 2 years ago

Hi!

Is it possible to have online/streaming updates to the models in Cornac? More exactly, is it possible to update a Cornac model WITHOUT retraining the entire thing when new users and new items are coming in? Initially I thought that init_params entry can be used for that, since maybe I can use pre-existing user/item latent factors to be able to "append" new data, but that is not the case. Maybe I mis-understood what these parameters do? Waht exactly is the trainable parameter for?

I've made a jupyter notebook, testing init_params and trainable=True/False, but they don't seem to do anything. Is this a bug ? The latent item/user factors remain the same. The train/test procedure was made so that i re-train on completely new items and users, but even though I do a random sampling train/test split the results are the same. Shapes and values of user/item latent factors remain the same.

I am not sure whether this is a bug or the feature of online/streaming updates is not yet implemented.

Below is an ASCII excerpt from a jupyter notebook that I've made to test my claims.

I've obfuscated any personal data that might pop-up, if you need more clarity, please let me know!

Thanks!

+*In[66]:*+
[source, ipython3]
----
import pandas as pd 
import numpy as np 
from numpy import log, sqrt, exp

import sys 
import os 
import json 
import janitor

from itertools import product
from random import sample 

from cornac.models import SVD, VAECF, BPR, MMMF, NMF, HPF, PMF
from cornac.models import BaselineOnly, GlobalAvg, MostPop

from cornac import Experiment 
from cornac.metrics import MAE, RMSE, Precision, Recall, NDCG, AUC, MAP
from cornac.eval_methods import RatioSplit, StratifiedSplit
# https://arxiv.org/pdf/2104.08912.pdf
from cornac.eval_methods import PropensityStratifiedEvaluation

from cornac.hyperopt import Discrete, Continuous
from cornac.hyperopt import GridSearch, RandomSearch

from cornac.data import Dataset

from statsmodels.distributions.empirical_distribution import ECDF

----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[79]:*+
[source, ipython3]
----
# train-test by users and items to simulate new users/new items situation 
p = 0.8

all_users = set(list(df[USER_COLUMN]))
train_users = set(sample(set(list(df[USER_COLUMN])), int(df[USER_COLUMN].nunique() * p)))
test_users = all_users - train_users

all_items = set(list(df[ITEM_COLUMN]))
train_items = set(sample(set(list(df[ITEM_COLUMN])), int(df[ITEM_COLUMN].nunique() * p)))
test_items = all_users - train_items

df_train = df.loc[(df[USER_COLUMN].isin(train_users)) & (df[ITEM_COLUMN].isin(train_items)) ,
                  [USER_COLUMN,ITEM_COLUMN,RATING_COLUMN]]
df_test = df.loc[(df[USER_COLUMN].isin(test_users)) & (df[ITEM_COLUMN].isin(test_items)) ,
                  [USER_COLUMN,ITEM_COLUMN,RATING_COLUMN]]

----

+*In[80]:*+
[source, ipython3]
----
# create Cornac test/train sets before fitting 
train_set = Dataset.from_uir(df_train.itertuples(index=False))
test_set = Dataset.from_uir(df_test.itertuples(index=False))
----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[81]:*+
[source, ipython3]
----
model = BPR(k=64, max_iter=100, learning_rate=0.01, lambda_reg=0.01, seed=SEED, verbose=True)
----

+*In[82]:*+
[source, ipython3]
----
model.fit(train_set)
----

+*Out[82]:*+
----  0%|          | 0/100 [00:00<?, ?it/s]
Optimization finished!
<cornac.models.bpr.recom_bpr.BPR at 0x7f472a83c710>----

+*In[83]:*+
[source, ipython3]
----
# {‘U’: user_factors, ‘V’: item_factors, ‘Bi’: item_biases}
init_params = {"U": model.u_factors, "V": model.i_factors, "Bi": model.i_biases}
----

+*In[84]:*+
[source, ipython3]
----
model_upd = BPR(k=64, max_iter=100, learning_rate=0.01, lambda_reg=0.01, 
                trainable=True, init_params=init_params, seed=SEED,  verbose=True)
----

+*In[85]:*+
[source, ipython3]
----
model_upd
----

+*Out[85]:*+
----<cornac.models.bpr.recom_bpr.BPR at 0x7f49182567f0>----

+*In[86]:*+
[source, ipython3]
----
model_upd.fit(test_set)
----

+*Out[86]:*+
----  0%|          | 0/100 [00:00<?, ?it/s]
Optimization finished!
<cornac.models.bpr.recom_bpr.BPR at 0x7f49182567f0>----

+*In[87]:*+
[source, ipython3]
----
init_params_upd = {"U": model_upd.u_factors, "V": model_upd.i_factors, "Bi": model_upd.i_biases}
----

+*In[88]:*+
[source, ipython3]
----
print(init_params['Bi'].shape)
print(init_params_upd['Bi'].shape)
print("-" * 50)
print(init_params['Bi'])
print("=" * 50)
print(init_params_upd['Bi'])
----

+*Out[88]:*+
----
(10062,)
(10062,)
--------------------------------------------------
[ 3.1520467   0.13694532 -0.5948288  ... -2.5533526  -2.5522206
 -2.46375   ]
==================================================
[ 3.1520467   0.13694532 -0.5948288  ... -2.5533526  -2.5522206
 -2.46375   ]
----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[89]:*+
[source, ipython3]
----
print(init_params['V'].shape)
print(init_params_upd['V'].shape)
print("-" * 50) 
print(init_params['V'])
print("=" * 50)
print(init_params_upd['V'])
----

+*Out[89]:*+
----
(10062, 64)
(10062, 64)
--------------------------------------------------
[[ 0.01305119  0.02446645  0.01209763 ... -0.06412598  0.04972798
  -0.0367249 ]
 [ 0.03434233 -0.05804791  0.50744015 ... -0.22468227 -0.02416496
   0.2276108 ]
 [ 0.3419299  -0.32397678  0.28592044 ... -0.0678139   0.0811412
   0.1422737 ]
 ...
 [ 0.01135851 -0.02270988  0.01306135 ... -0.01896087  0.00118663
   0.01033379]
 [-0.03619554 -0.00582229  0.06442755 ...  0.04406089 -0.09079324
   0.13692585]
 [-0.00969888 -0.06196788  0.039554   ... -0.04377511 -0.05868173
   0.09974333]]
==================================================
[[ 0.01305119  0.02446645  0.01209763 ... -0.06412598  0.04972798
  -0.0367249 ]
 [ 0.03434233 -0.05804791  0.50744015 ... -0.22468227 -0.02416496
   0.2276108 ]
 [ 0.3419299  -0.32397678  0.28592044 ... -0.0678139   0.0811412
   0.1422737 ]
 ...
 [ 0.01135851 -0.02270988  0.01306135 ... -0.01896087  0.00118663
   0.01033379]
 [-0.03619554 -0.00582229  0.06442755 ...  0.04406089 -0.09079324
   0.13692585]
 [-0.00969888 -0.06196788  0.039554   ... -0.04377511 -0.05868173
   0.09974333]]
----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[90]:*+
[source, ipython3]
----
print(init_params['U'].shape)
print(init_params_upd['U'].shape)
print("-" * 50)
print(init_params['U'])
print("=" * 50)
print(init_params_upd['U'])
----

+*Out[90]:*+
----
(124363, 64)
(124363, 64)
--------------------------------------------------
[[-0.02319657 -0.01407164  0.05480259 ... -0.03141589  0.01111291
   0.02920822]
 [-0.02930616  0.01841966 -0.09741018 ...  0.0887912   0.05942525
  -0.00703447]
 [ 0.15145437 -0.5022358   0.20136377 ... -0.48140875 -0.17037101
   0.04936205]
 ...
 [-0.01436977 -0.00892285 -0.00772794 ...  0.0034506   0.00002683
  -0.00882841]
 [-0.00565993 -0.00651298  0.00216395 ... -0.00764577  0.00397873
  -0.00386099]
 [ 0.00032684  0.00132209 -0.00376659 ... -0.00445988 -0.00489008
  -0.00139413]]
==================================================
[[-0.02319657 -0.01407164  0.05480259 ... -0.03141589  0.01111291
   0.02920822]
 [-0.02930616  0.01841966 -0.09741018 ...  0.0887912   0.05942525
  -0.00703447]
 [ 0.15145437 -0.5022358   0.20136377 ... -0.48140875 -0.17037101
   0.04936205]
 ...
 [-0.01436977 -0.00892285 -0.00772794 ...  0.0034506   0.00002683
  -0.00882841]
 [-0.00565993 -0.00651298  0.00216395 ... -0.00764577  0.00397873
  -0.00386099]
 [ 0.00032684  0.00132209 -0.00376659 ... -0.00445988 -0.00489008
  -0.00139413]]
----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[91]:*+
[source, ipython3]
----
# check to see whether the matrices are different 
print(np.sum(init_params['Bi'] != init_params_upd['Bi']))
print(np.sum(init_params['U'] != init_params_upd['U']))
print(np.sum(init_params['V'] != init_params_upd['V']))
----

+*Out[91]:*+
----
0
0
0
----

+*In[ ]:*+
[source, ipython3]
----

----

+*In[ ]:*+
[source, ipython3]
----

----
tqtg commented 2 years ago

Currently, we haven't supported dynamically adding user/item cause the focus is still on the controlled experiment. Having said that it's still an important feature that we're looking into.

Just to clarify, the use of init_params and trainable is more for model saving/loading.