benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

similar_users throws IndexError, but work on similar_items #567

Closed MDTsai closed 2 years ago

MDTsai commented 2 years ago

I try to use implicit in a buyer, supplier, shipment dataset. Follow the lastfm example, I prepare a sparse csr matrix which supplier for row and buyer for column and shipment for value as artist_user_plays = get_lastfm()

supplier_buyer_shipment = sparse.csr_matrix((df['Shipment'], (df['supplier_id'], df['buyer_id'])))

supplier_buyer_shipment.shape shows (846871, 1148973) matches the number of unique suppliers and buyers.

I don't change the model training arguments so it looks like lastfm example:

from implicit.nearest_neighbours import bm25_weight
supplier_buyer_shipment = bm25_weight(supplier_buyer_shipment, K1=100, B=0.8)

model = implicit.als.AlternatingLeastSquares(factors=64, regularization=0.05)
model.fit(2 * supplier_buyer_shipment)

When I want to find a similar buyer, I use:

buyerid = 919366
ids, scores = model.similar_users(buyerid)

and I got this error:

Traceback (most recent call last):
  File "/Users/Eric/recommend.py", line 32, in <module>
    ids, scores = model.similar_users(buyerid)
  File "/Users/Eric/env/lib/python3.9/site-packages/implicit/cpu/matrix_factorization_base.py", line 146, in similar_users
    norm = norms[userid]
IndexError: index 919366 is out of bounds for axis 0 with size 846871

It looks strange, 846871 should be the number of supplier not buyer(or user), so it throws Error. So I change to:

buyerid = 919366
ids, scores = model.similar_items(buyerid)

This works with similar buyers' id.

I don't understand why but I will keep to see if this is my problem or implicit's.

MDTsai commented 2 years ago

I see the problem:

From als.py

rows of the matrix are the users, the columns are the items liked that user

which is different from the turorial

with the each row corresponding to a different musician and each column corresponding to a different user.
benfred commented 2 years ago

I think the tutorial is correct, since later on in the tutorial we compute the transpose of the artist_user_plays before passing to model.fit:

# get the transpose since the most of the functions in implicit expect (user, item) sparse matrices instead of (item, user)
user_plays = artist_user_plays.T.tocsr()

I think this could be clearer though -

MDTsai commented 2 years ago

Thanks for the information, close this issue.