benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

als explain method bug #701

Open eostendarp opened 1 year ago

eostendarp commented 1 year ago

I'm encountering an issue when running the explain method. I'm unsure of what is going wrong, but it seems like the dimensions of some matrix are being unintentionally flipped at some point.

The only userid that appears to dodge the issue is 0, but there is no such user in the training data I'm working with.

Below is code and output. Any feedback would be greatly appreciated!

import threadpoolctl
import numpy as np
from scipy import sparse
import implicit

threadpoolctl.threadpool_limits(1, 'blas')

matrix = np.loadtxt('./favs-2023-10-24.csv', dtype=np.uintc, delimiter=',')
user_post = sparse.csr_matrix((np.ones(matrix.shape[0], dtype=np.bool_), (matrix[:, 1], matrix[:, 0])))

model = implicit.als.AlternatingLeastSquares(factors=256, regularization=0.01, alpha=40, dtype=np.float32, iterations=50, calculate_training_loss=True)
model.fit(user_post)

user_id = 23
ids, scores = model.recommend(user_id, user_post[user_id], N=10, filter_already_liked_items=False)
print({'post_id': ids, 'score': scores, 'already_fav\'d': np.in1d(ids, user_post[user_id].indices)})

{'post_id': array([127664, 105085, 160655, 205782, 187429, 185678, 188119, 265365, 177336, 220538], dtype=int32), 'score': array([0.38005394, 0.3619479 , 0.35976228, 0.3480073 , 0.34047693, 0.34022546, 0.33973548, 0.33651435, 0.33490524, 0.3335871 ], dtype=float32), "already_fav'd": array([False, False, False, True, True, True, True, True, True, False])}

cpu_model = model.to_cpu()

user_id = 23
ids, scores = cpu_model.recommend(user_id, user_post[user_id], N=10, filter_already_liked_items=False)
print({'post_id': ids, 'score': scores, 'already_fav\'d': np.in1d(ids, user_post[user_id].indices)})

{'post_id': array([127664, 105085, 160655, 205782, 187429, 185678, 188119, 265365, 177336, 220538], dtype=int32), 'score': array([0.38005394, 0.36194786, 0.35976222, 0.3480073 , 0.34047693, 0.34022546, 0.33973545, 0.33651435, 0.33490527, 0.3335871 ], dtype=float32), "already_fav'd": array([False, False, False, True, True, True, True, True, True, False])}

cpu_model.explain(user_id, user_post[user_id], 127664)

**IndexError Traceback (most recent call last) /home/eostendarp/workspace/e621/notebook.ipynb Cell 10 line 1 ----> 1 cpu_model.explain(23, user_post[user_id], 127664)

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/cpu/als.py:386, in AlternatingLeastSquares.explain(self, userid, user_items, itemid, user_weights, N) 383 # user_weights = Cholesky decomposition of Wu^-1 384 # from section 5 of the paper CF for Implicit Feedback Datasets 385 if userweights is None: --> 386 A, = user_linear_equation( 387 self.item_factors, self.YtY, user_items, userid, self.regularization, self.factors 388 ) 389 user_weights = scipy.linalg.cho_factor(A) 390 seed_item = self.item_factors[itemid]

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/cpu/als.py:503, in user_linear_equation(Y, YtY, Cui, u, regularization, n_factors) 500 # accumulate YtCuPu in b 501 b = np.zeros(n_factors) --> 503 for i, confidence in nonzeros(Cui, u): 504 factor = Y[i] 506 if confidence > 0:

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/utils.py:11, in nonzeros(m, row) 9 def nonzeros(m, row): 10 """returns the non zeroes of a row in csr_matrix""" ---> 11 for index in range(m.indptr[row], m.indptr[row + 1]): 12 yield m.indices[index], m.data[index]

IndexError: index 23 is out of bounds for axis 0 with size 2**

gtfuhr commented 3 months ago

I was facing the same issue @eostendarp. Here's the solution in case anyone else faces this in the future: You need to alter the line: cpu_model.explain(user_id, user_post[user_id], 127664)

To: cpu_model.explain(user_id, user_post, 127664)

The user_id will be used as an index internally in the explain function to select the user data from the user_post, as mentioned by the lib creator in this other github issue.