benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.56k stars 611 forks source link

Cross validation #108

Open topspinj opened 6 years ago

topspinj commented 6 years ago

I would like to tune hyper-parameters with implicit's AlternatingLeastSquares. Ideally, I would use cross-validation but it seems like there is no simple way to "fit" on training data and "predict" on test data.

Any thoughts on how to handle this? Thanks!

benfred commented 6 years ago

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

qjflores commented 6 years ago

👋 where can I find the eval branch to test out the evaluation module?

EDIT: ok great I can see it here in this commit https://github.com/benfred/implicit/commit/861713e6cb4e65d7485abfab5e843d4872bf4bd1

benfred commented 6 years ago

@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions

AFimin commented 6 years ago

Hey @benfred, thank you for such a great tool you've created! Maybe you could help found answer why above code works like this:

screen shot 2018-09-24 at 22 33 12

100% - is model training progress; but 49% is a P@K evaluation

benfred commented 6 years ago

@AFimin that doesn't look right - there should only be one progress par for the fit and one for the evaluation given the code above.

When running the code snippet above it should look something like:

In [1]: from implicit.evaluation import precision_at_k, train_test_split
   ...: from implicit.als import AlternatingLeastSquares
   ...: from implicit.datasets.movielens import get_movielens
   ...: 
   ...: movies, ratings = get_movielens("1m")
   ...: train, test = train_test_split(ratings)
   ...: 
   ...: model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
   ...: model.fit(train)
   ...: 
   ...: p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
   ...: 

100%|█████████████████████████████████████████████████████████████████████████| 15.0/15 [00:02<00:00,  5.43it/s]
100%|████████████████████████████████████████████████████████████████████▉| 6036/6041 [00:01<00:00, 3208.17it/s]

Where the first progress bar is model fitting and the second is the evaluation.

I'm not sure why you are seeing repeated progress bars there - are you training/evaluating in a loop or using the 'fit_callback' functionality?

You can turn off these progress bars by passing in 'show_progress=False' to model.fit or precision_at_k if that helps.

For the evaluation progress bar not getting to 100% - I'm guessing it's because of these lines: https://github.com/benfred/implicit/blob/393de3f4e4a6b73eb051ed236a94272cabdfe548/implicit/evaluation.pyx#L97-L99

We're skipping evaluating the user if the user doesn't have any items in the test set - but right now the progress bar isn't getting incremented there. This could cause the progress bar to not hit 100% when doing evaluation : for your dataset do only 49% of users have items in the test set?

AFimin commented 6 years ago

@benfred Thanks for so quick response, and excuse me for messing up a bit. Let me clarify. 1) Repeated progress-bar its expected, I'm trying to do some cv for hyper parameter fitting. 2) And yes, the question was that eval bar not hitting 100%, that caused me to think that im doing something wrong. Anyway, you've confirmed my assumptions, thank you again!

benfred commented 6 years ago

Awesome! glad it's not something really weird anyways =).

I've put in a fix here https://github.com/benfred/implicit/commit/363c9875c13146c9e50c07d8452a4bba55751aad . I think this means that the progress bars will hit 100% during xval even if there are users missing items in the test set.

civilinformer commented 6 years ago

@benfred What a great library!

I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?

ifokeev commented 5 years ago

@Acey25 same issue here =\ following this https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e and it doesn't work

rituk commented 5 years ago

@qjflores It's in master/pypi now - I'm just leaving this open because I need to add some documentation on these functions

Does this work in python 3.7?

qjflores commented 5 years ago

@rituk I'm not sure I haven't tried

philippestepniewskiperso commented 4 years ago

@benfred What a great library!

I am using the precision_at_k to tune parameters of the model, including two parameters for my custom importance function, the regularization parameter, number of factors and number of iterations. I can see that while training is on the GPU, evaluation is on the CPU. Evaluation is about a factor of 5 slower, so most of the time is spent in evaluation. Is there any way to move the evaluation onto the GPU as well?

I think this is the main bottleneck of ALS or any other matrix factorization based algorithms. Computing the map@k or other ranking metrics for the test set is very slow. On my dataset, training takes ~20s but evaluation on 1% of this data takes 10 minutes. Too slow for parameters tuning.

I don't know if it's possible but having the metrics available with GPU computation would be awesome.

philippestepniewskiperso commented 4 years ago

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)

SheldonGrant commented 4 years ago

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k) @Phildumoux Are you willing to share your find? I am running in to similar bottleneck. @benfred Any recommendatioins here? Speeding up metrics over GPU?

rituk commented 4 years ago

I found a way to leverage GPU using the cupy library (cupy.arpartition) allowing faster computation of recommendation on test set. (thus faster map@k)

Please do share details.

jselma commented 3 years ago

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

hi @benfred, How to get the recommendations for a user, if it was trained as user_item_rating?

Understanding that the recommendations of the initial guide are trained with the _item_userdata matrix, while the ranking_metrics_at_k functions have train_user_items as input, and in the next line you will get the users and items cdef int users = test_user_items.shape [0], items = test_user_items.shape [1]

Thank you!

rituk commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 
jselma commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

rituk commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

jselma commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:

sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))

np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)

n=5#Numero de top N recomendaciones

alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark)

modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

rituk commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected:

sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item'])))

np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)

n=5#Numero de top N recomendaciones

alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark)

modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  
jselma commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?

What is reference_articles? Thanks

rituk commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)?

Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code. See if his code helps.

https://github.com/benfred/implicit/blob/master/tests/recommender_base_test.py

jselma commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.

Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.

rituk commented 3 years ago

@rituk I'm not sure I haven't tried

It does work with 3.7, I have implemented it. 

Hi @rituk, Could you give an example? Thank you!.

Are you looking for gpu version? I followed this example. https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I am using it with CUDA, please your help. My code, I highlighted with bold where the problem should be corrected: sparse_item_user = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['id_item'], dataImplicit['user']))) sparse_user_item = sparse.csr_matrix((dataImplicit['ratingFiltrado'].astype(float), (dataImplicit['user'], dataImplicit['id_item']))) np.random.seed(1234) user_item_train, user_item_test = train_test_split(sparse_user_item, train_percentage=0.75)

Building the model

modelALS = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS2 = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS3 = implicit.als.AlternatingLeastSquares(factors=150, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS4 = implicit.als.AlternatingLeastSquares(factors=200, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True)#Este es el mejor modelALS5 = implicit.als.AlternatingLeastSquares(factors=250, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) modelALS6 = implicit.als.AlternatingLeastSquares(factors=300, regularization=0.1, iterations=20,use_gpu=implicit.cuda.HAS_CUDA, use_cg =True) n=5#Numero de top N recomendaciones alpha_val = 40

alpha_val = 1

data_conf = (user_item_train * alpha_val).astype('double')

TESTEAMOS LOS DISTINTOS ALGORITMOS

benchmark = []

Iterate over all algorithms

for algorithm in [modelALS, modelALS2, modelALS3, modelALS4, modelALS5, modelALS5]: algorithm.fit(data_conf)

algorithm.user_factors
algorithm.item_factors

resultadosTotales=ranking_metrics_at_k(algorithm, user_item_train.T.tocsr(), user_item_test.T.tocsr(), K=n, num_threads=4, show_progress=True)

benchmark.append(resultadosTotales)

pd.DataFrame(benchmark) modeloSeleccionado=modelALS3

Get Recommendations

user_items_ok = user_item_train.T.tocsr() recs = modeloSeleccionado.recommend(userid=100452, user_items=user_items_ok, recalculate_user=True)

 I'm not sure what the error is, but you can pass transposed matrix to the **model.recommend** call. 
 Example: 
 **for item, score in model.recommend(user_id_dict[username], df_weighted_T, 
                               filter_items=reference_articles,filter_already_liked_items=1, N=10):**   

    where df_weighted_T is user, item matrix.  

What is the order of the columns of the df_weighted_T dataframe and what is the format of the training data (user-items or items-user)? Thanks

df_weighted_T= user-item. Model is trained with items-user, transposed to user-item to call model.recommend(user-item). I'll have to look up, i'm not near the code.

Ok, but the problem is that the metric functions have as input the user_item matrix format, how can I adjust the parameters if the trained model is item_user?. Remember the ranking_metrics_at_k function have this line test_user_items.shape [0], items = test_user_items.shape [1] , therefore the metrics are reversed.

Any parameters you want to adjust in the **model.recommend** call is basically a filter being applied to the matrix. I think 
should be doable. 
essefi-ahlem commented 1 year ago

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval

With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953

tonyjward commented 1 year ago

I'm adding some basic support for crossvalidation (targeting metrics like p@k, map@k and ndcg@k to start off with). Its a work in progress right now, but the initial changes are here: https://github.com/benfred/implicit/compare/eval With this branch you can do something like

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

movies, ratings = get_movielens("1m")
train, test = train_test_split(ratings)

model = AlternatingLeastSquares(factors=128, regularization=20, iterations=15)
model.fit(train)

p = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

You can check out the eval branch if you want to try this out today ( that example above should work with this branch), I'm hoping to have this finished up later this week

@benfred the above code is throwing this error now: IndexError: index 3953 is out of bounds for axis 0 with size 3953

@essefi-ahlem if you replace the last line as follows it will work p = precision_at_k(model, train, test, K=10, num_threads=4)