Open topspinj opened 6 years ago
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens
movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)
model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)
p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
👋 where can I find the eval
branch to test out the evaluation module?
EDIT: ok great I can see it here in this commit https://github.com/benfred/implicit/commit/861713e6cb4e65d7485abfab5e843d4872bf4bd1
@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions
Hey @benfred, thank you for such a great tool you've created! Maybe you could help found answer why above code works like this:
100% - is model training progress; but 49% is a P@K evaluation
@AFimin that doesn't look right - there should only be one progress par for the fit and one for the evaluation given the code above.
When running the code snippet above it should look something like:
In [1]: from implicit.evaluation import precision_at_k, train_test_split
...: from implicit.als import AlternatingLeastSquares
...: from implicit.datasets.movielens import get_movielens
...:
...: movies, ratings = get_movielens("1m")
...: train, test = train_test_split(ratings)
...:
...: model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
...: model.fit(train)
...:
...: p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
...:
100%|█████████████████████████████████████████████████████████████████████████| 15.0/15 [00:02<00:00, 5.43it/s]
100%|████████████████████████████████████████████████████████████████████▉| 6036/6041 [00:01<00:00, 3208.17it/s]
Where the first progress bar is model fitting and the second is the evaluation.
I'm not sure why you are seeing repeated progress bars there - are you training/evaluating in a loop or using the 'fit_callback' functionality?
You can turn off these progress bars by passing in 'show_progress=False' to model.fit or precision_at_k if that helps.
For the evaluation progress bar not getting to 100% - I'm guessing it's because of these lines: https://github.com/benfred/implicit/blob/393de3f4e4a6b73eb051ed236a94272cabdfe548/implicit/evaluation.pyx#L97-L99
We're skipping evaluating the user if the user doesn't have any items in the test set - but right now the progress bar isn't getting incremented there. This could cause the progress bar to not hit 100% when doing evaluation : for your dataset do only 49% of users have items in the test set?
@benfred Thanks for so quick response, and excuse me for messing up a bit. Let me clarify. 1) Repeated progress-bar its expected, I'm trying to do some cv for hyper parameter fitting. 2) And yes, the question was that eval bar not hitting 100%, that caused me to think that im doing something wrong. Anyway, you've confirmed my assumptions, thank you again!
Awesome! glad it's not something really weird anyways =).
I've put in a fix here https://github.com/benfred/implicit/commit/363c9875c13146c9e50c07d8452a4bba55751aad . I think this means that the progress bars will hit 100% during xval even if there are users missing items in the test set.
@benfred What a great library!
I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?
@Acey25 same issue here =\ following this https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e and it doesn't work
@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions
Does this work in python 3.7?
@rituk I'm not sure I haven't tried
@benfred What a great library!
I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?
I think this is the main bottleneck of ALS or any other matrix factorization based algorithms. Computing the map@k or other ranking metrics for the test set is very slow. On my dataset, training takes ~20s but evaluation on 1% of this data takes 10 minutes. Too slow for parameters tuning.
I don't know if it's possible but having the metrics available with GPU computation would be awesome.
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k) @Phildumoux Are you willing to share your find? I am running in to similar bottleneck. @benfred Any recommendatioins here? Speeding up metrics over GPU?
I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)
Please do share details.
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
hi @benfred, How to get the recommendations for a user, if it was trained as user_item_rating?
Understanding that the recommendations of the initial guide are trained with the _item_userdata matrix, while the ranking_metrics_at_k functions have train_user_items as input, and in the next line you will get the users and items cdef int users = test_user_items.shape [0], items = test_user_items.shape [1]
Thank you!
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:
sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))
np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)
n=5#Numero de top N recomendaciones
alpha_val = 40
data_conf = (user_item_train * alpha_val).astype('double')
benchmark = []
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors
algorithm.item_factors
resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)
benchmark.append(resultadosTotales)
pd.DataFrame(benchmark)
modeloSeleccionado=modelALS3
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:
sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))
np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
Building the model
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)
n=5#Numero de top N recomendaciones
alpha_val = 40
alpha_val = 1
data_conf = (user_item_train * alpha_val).astype('double')
TESTEAMOS LOS DISTINTOS ALGORITMOS
benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark)
modeloSeleccionado=modelALS3
Get Recommendations
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call.
Example:
**for item, score in model.recommend(user_id_dict[username], df_weighted_T,
filter_items=reference_articles,filter_already_liked_items=1, N=10):**
where df_weighted_T is user, item matrix.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
Building the model
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40
alpha_val = 1
data_conf = (user_item_train * alpha_val).astype('double')
TESTEAMOS LOS DISTINTOS ALGORITMOS
benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3
Get Recommendations
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?
What is reference_articles? Thanks
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
Building the model
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40
alpha_val = 1
data_conf = (user_item_train * alpha_val).astype('double')
TESTEAMOS LOS DISTINTOS ALGORITMOS
benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3
Get Recommendations
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?
Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code. See if his code helps.
https://github.com/benfred/implicit/blob/master/tests/recommender_base_test.py
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
Building the model
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40
alpha_val = 1
data_conf = (user_item_train * alpha_val).astype('double')
TESTEAMOS LOS DISTINTOS ALGORITMOS
benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3
Get Recommendations
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.
Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.
@rituk I'm not sure I haven't tried
It does work with 3.7, I have implemented it.
Hi @rituk, Could you give an example? Thank you!.
Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py
I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)
Building the model
modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40
alpha_val = 1
data_conf = (user_item_train * alpha_val).astype('double')
TESTEAMOS LOS DISTINTOS ALGORITMOS
benchmark = []
Iterate over all algorithms
for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)
algorithm.user_factors algorithm.item_factors resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True) benchmark.append(resultadosTotales)
pd.DataFrame(benchmark) modeloSeleccionado=modelALS3
Get Recommendations
user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)
I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. Example: **for item, score in model.recommend(user_id_dict[username], df_weighted_T, filter_items=reference_articles,filter_already_liked_items=1, N=10):** where df_weighted_T is user, item matrix.
What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks
df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.
Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.
Any parameters you want to adjust in the **model.recommend** call is basically a filter being applied to the matrix. I think
should be doable.
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval
With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953
I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval With this branch you can do something like
from implicit.evaluation import precision_at_k, train_test_split from implicit.als import AlternatingLeastSquares from implicit.datasets.movielens import get_movielens movies, ratings = get_movielens("1m") train, test = train_test_split(ratings) model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15) model.fit(train) p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week
@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953
@essefi-ahlem if you replace the last line as follows it will work
p = precision_at_k(model, train, test, K=10, num_threads=4)
I would like to tune hyper-parameters with implicit's
AlternatingLeastSquares
. Ideally, I would use cross-validation but it seems like there is no simple way to "fit" on training data and "predict" on test data.Any thoughts on how to handle this? Thanks!