benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Method .recommend are different for MF-model and Item2Item-model #607

Closed muxaulmarin closed 2 years ago

muxaulmarin commented 2 years ago

Hello. I prepared a small reproducible example to better explain the problem

numpy version = 1.23.1 scipy version = 1.8.1 implicit version = 0.6.0

Configure

n_users = 1000
n_items = 100
user_id = 55
topk = 10

Create Users2Items matrix

items = np.arange(n_items)
interactions = []
for user in range(n_users):
    n_interactions = numpy.random.randint(3, 10)
    user_interactions = numpy.random.choice(items, n_interactions, replace=False)
    for item in user_interactions:
        interactions.append([user, item])
interactions = numpy.array(interactions)
uim = scipy.sparse.coo_matrix(
    (
        numpy.random.rand(interactions.shape[0]) + 1e-3, 
        (interactions[:, 0], interactions[:, 1])
    ),
    shape=(n_users, n_items),
)
uim = uim.tocsr()

Fit ALS model

als = implicit.als.AlternatingLeastSquares(factors=8, iterations=8, random_state=8, use_gpu=False)
als.fit(uim)

Fit Item2Item model

cosine = implicit.nearest_neighbours.CosineRecommender(K=8)
cosine.fit(uim)

Get recommendations We can get recommendations without using method .recommend, for AlternatingLeastSquares it is the multiplication of user factors by product factors, for CosineRecommender it is the multiplication of user interactions by the product similarity matrix

als_recs_true = [
    idx_score[0] 
    for idx_score in sorted(
        enumerate(als.user_factors[user_id].dot(als.item_factors.T)), 
        key=lambda x: -x[1]
    )
][:topk]

cos_recs_true = [
    idx_score[0] 
    for idx_score in sorted(
        enumerate(uim[user_id].dot(cosine.similarity).A[0]), 
        key=lambda x: -x[1]
    )
][:topk]

Now we will get recommendations using the method .recommend in ways A and B

als_recs_a, _ = als.recommend(
    userid=user_id,
    user_items=uim,
    N=topk,
    filter_already_liked_items=False,
    filter_items=None,
    recalculate_user=False,
    items=None
)
als_recs_b, _ = als.recommend(
    userid=0,
    user_items=uim[user_id],
    N=topk,
    filter_already_liked_items=False,
    filter_items=None,
    recalculate_user=False,
    items=None
)
set(als_recs_true) == set(als_recs_a), set(als_recs_true) == set(als_recs_b)
# (True, False)

Way A was correct for AlternatingLeastSquares

cos_recs_a, _ = cosine.recommend(
    userid=user_id,
    user_items=uim,
    N=topk,
    filter_already_liked_items=False,
    filter_items=None,
    recalculate_user=False,
    items=None
)
cos_recs_b, _ = cosine.recommend(
    userid=0,
    user_items=uim[user_id],
    N=topk,
    filter_already_liked_items=False,
    filter_items=None,
    recalculate_user=False,
    items=None
)
set(cos_recs_true) == set(cos_recs_a), set(cos_recs_true) == set(cos_recs_b)
# (False, True)

Way B was correct for CosineRecommender

It seems way A should be correct in both models

benfred commented 2 years ago

You're not calling the model's correctly with either your A or B option - which is why you're getting different results from what'd you expect

Can you try with:

 cosine.recommend(
    userid=user_id,
    user_items=uim[user_id]
)

You'll need to set the other filtering options too - but where you are going wrong is in setting the userid / user_items parameters - you're passing the wrong userid in option B, and passing the wrong user_items in option A.

I should probably have the cosine model throw an error to highlight this earlier - but I believe that the ALS model would have thrown an error if you had specified either recalculate_use or filter_already_liked_items options (without those, the MF models doesn't need the sparse matrix passed in and ignores the user_items parameter).

I tried showing how to use the recommend API in the tutorial notebook - https://benfred.github.io/implicit/tutorial_lastfm.html#Making-Recommendations

# Get recommendations for the a single user
userid = 12345
ids, scores = model.recommend(userid, user_plays[userid], N=10, filter_already_liked_items=False)
muxaulmarin commented 2 years ago

Thx for answer, unfortunately for me it does not solve the problem

benfred commented 2 years ago

Thx for answer, unfortunately for me it does not solve the problem

Sorry to hear that you're still having a problem - let me know if you need any help solving