Cosine similarity between two particular users is 1.0 whereas they have different ratings for a common item

Description

I was trying to create my custom algo and before moving to something more complicated I wanted to make sure I understand how current code is working. I couldn't understand why Cosine similarity between two particular users is 1.0 whereas they have different ratings for common item.

Steps/Code to Reproduce

from surprise import AlgoBase, BaselineOnly, NormalPredictor
from surprise import PredictionImpossible

class MyAlgorithm(AlgoBase):
    def __init__(self, sim_options={}, bsl_options={}):

        AlgoBase.__init__(self, sim_options=sim_options, bsl_options=bsl_options)

    def fit(self, trainset):

        AlgoBase.fit(self, trainset)

        # Compute baselines and similarities
        self.bu, self.bi = self.compute_baselines()
        self.sim = self.compute_similarities()

        return self

    def estimate(self, u, i):

        if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
            raise PredictionImpossible("User and/or item is unknown.")

        # Compute similarities between u and v, where v describes all other
        # users that have also rated item i.

        neighbors = [(v, self.sim[u, v]) for (v, r_ignored) in self.trainset.ir[i]]  
        # Sort these neighbors by similarity
        neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True)

        print(f"estimate({u}, {i}):")
        print("The 3 nearest neighbors of user", str(u), "are:")
        for v, sim_uv in neighbors[:3]:
            print(f"user {v} with sim {sim_uv:1.2f}")

        # ... Aaaaand return the baseline estimate anyway ;)
        bsl = self.trainset.global_mean + self.bu[u] + self.bi[i]
        return bsl

Preparing the data ( taking subsample of 20K to make exploration/investigation faster):

# import urllib.request
# urllib.request.urlretrieve("http://files.grouplens.org/datasets/movielens/ml-1m.zip", "ml-1m.zip")

reader = Reader(rating_scale=(1, 5))
data_full_lr_20k = data_full_lr[:20000]
data_full = Dataset.load_from_df(data_full_lr_20k[["user", "item", "label"]], reader)

trainset_full, testset_full = train_test_split(data_full, test_size=0.25, random_state=42)

Then I'm running my new custom algo:

bsl_options = {"method": "als", "n_epochs": 5, "reg_u": 12, "reg_i": 5}
sim_options = {
    "name": "cosine",
    "user_based": True,  # compute  similarities between users
}

algo = MyAlgorithm(bsl_options=bsl_options, sim_options=sim_options)

# train and test algorithm.
algo.fit(trainset_full)
predictions = algo.test(testset_full)

# Compute and print Root Mean Squared Error
accuracy.rmse(predictions, verbose=True)

Expected Results

In the printout ( the one that is provided in docs ) I was looking for some neigbors with non zero similarity and for example took this one:

estimate(171, 428)
The 3 nearest neighbours of user 171 are:
user 1123 with sim 1.00
user 450 with sim 0.00
user 455 with sim 0.00

Note that I also added logging for the current estimate function parameters to know which item we are predicting for ( 428 in this example)

So I see that algo considered user 171 and 1123 to be similar. I decided to check it manually.

As "user_based": True then we are calculating similarity between user 171 and other users, that have ratings for 428.

So i checked trainset_full.ir[428] Output:

[(450, 1.0),
 (455, 2.0),
 (2683, 3.0),
 (1715, 2.0),
 (2819, 2.0),
 (2928, 2.0),
 (1654, 1.0),
 (2179, 3.0),
 (49, 3.0),
 (621, 2.0),
 (3757, 3.0),
 (324, 1.0),
 (2780, 2.0),
 (1123, 2.0)]. <--------------------------- this one. User 1123 rated item 428 with rating 2.0. Looks ok, so far(ie it qualified into the neighbourhood)

Then I decided to check rating for these 2 users 171 and 1123 to see whether they have similar ratings for common items.

# trainset_full.ur[171]
sorted(trainset_full.ur[171], key=lambda tup: tup[0])

Output:
[(32, 4.0),
 (116, 4.0),
 (160, 2.0),
 (322, 4.0),
 (476, 3.0),
 (513, 2.0),
 (540, 4.0),
 (891, 2.0),
 (896, 4.0),
 (919, 3.0),
 (1034, 3.0),
 (1277, 4.0),
 (1921, 4.0),
 (2106, 3.0),
 (2134, 1.0),
 (2152, 5.0),
 (2493, 3.0)]

sorted(trainset_full.ur[1123], key=lambda tup: tup[0])

[(54, 4.0),
 (182, 3.0),
 (281, 4.0),
 (292, 4.0),
 (428, 2.0),
 (459, 5.0),
 (654, 5.0),
 (891, 3.0),
 (1019, 2.0),
 (1167, 5.0),
 (1210, 3.0),
 (1557, 4.0),
 (1637, 5.0),
 (1666, 3.0)]

I found only one common item and the ratings are different

(891, 2.0)  vs   (891, 3.0),

I didn't specify min_support but it shouldn't matter as when we below it we get 0. It means we are greater or equal than min_support value ( probably default is 1 ).

I would expect similarity not to be equal 1 as ratings are not the same ( 2.0 vs 3.0)

Actual Results

The 3 nearest neighbours of user 171 are: user 1123 with sim 1.00

Versions

macOS-10.16-x86_64-i386-64bit Python 3.10.14 (main, May 6 2024, 14:47:20) [Clang 14.0.6 ] surprise 1.1.4

NicolasHug / Surprise