PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems
https://cornac.preferred.ai
Apache License 2.0
879 stars 143 forks source link

[BUG] The user graph is not updated correctly when using cross-validation #325

Closed zwhe99 closed 4 years ago

zwhe99 commented 4 years ago

Description

I try to add a recommender model with social network. And I need to access the 'trustees' for each 'trustor' in training set, so I code as follows:

        # for each user in train_set
        for i in train_set.user_indices:
            S_i = train_set.user_graph.batch(i).data  # IndexError: row index (1484) out of range

Then an error occurred in the place shown in the comment:

Traceback (most recent call last): File "D:/gitRepo/OLADSR(refactored)/test.py", line 97, in eval_method=cv_split, models=[rs_dsr, rs_sorec], metrics=[rmse, ndcgs, pre, rec] File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\experiment\experiment.py", line 130, in run show_validation=self.show_validation, File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\eval_methods\cross_validation.py", line 136, in evaluate self, new_model, metrics, user_based, show_validation=False File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\eval_methods\base_method.py", line 584, in evaluate model.fit(self.train_set, self.val_set) File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\hyperopt.py", line 146, in fit model = self.model.clone(params).fit(train_set, val_set) File "D:\gitRepo\OLADSR(refactored)\models\dsr\recom_dsr.py", line 84, in fit self.maxItr2) File "D:\gitRepo\OLADSR(refactored)\models\dsr\recom_dsr.py", line 141, in dcd_b S_i = train_set.user_graph.batch(i).data File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\data\graph.py", line 148, in batch return self.matrix[batch_ids] File "D:\Anaconda3\envs\DSR\lib\site-packages\scipy\sparse_index.py", line 35, in getitem row, col = self._validate_indices(key) File "D:\Anaconda3\envs\DSR\lib\site-packages\scipy\sparse_index.py", line 135, in _validate_indices raise IndexError('row index (%d) out of range' % row) IndexError: row index (1484) out of range

I set a breakpoint at the location where the error occurred and found that the size of the train_set.user_graph.matrix is smaller than train_set.num_users when the error occurred.

I don't know if my code is wrong or a bug in Cornac, but this error only appears in the last few folds of cross-validation, the first fold usually works correctly. So I think user_graph may not be updated correctly in the later fold.

In which platform does it happen?

OS: Windows 10 python:3.6 Cornac: 1.4.1

How do we replicate the issue?

Run the script below:

from cornac.exception import ScoreException
from cornac.models import Recommender
import numpy as np
import cornac.hyperopt

class Test(Recommender):

    def monitor_value(self):
        pass

    def __init__(
            self,
            name="Test",
            trainable=True,
            verbose=False,
            seed=None):
        Recommender.__init__(self, name=name, trainable=trainable, verbose=verbose)

        self.seed = seed
        self.U, self.V = None, None

    def fit(self, train_set: cornac.data.dataset.Dataset, val_set=None):
        Recommender.fit(self, train_set, val_set)
        num_users = train_set.num_users
        num_items = train_set.num_items
        self.U = np.random.rand(8, num_users)
        self.V = np.random.rand(8, num_items)

        # for each user
        for i in train_set.user_indices:
            S_i = train_set.user_graph.batch(i).data  # index out of range here

        return self

    def score(self, user_id, item_id=None):
        if item_id is None:
            if self.train_set.is_unk_user(user_id):
                raise ScoreException(
                    "Can't make score prediction for (user_id=%d)" % user_id
                )

            known_item_scores = self.V.T.dot(self.U[:, user_id])
            return known_item_scores
        else:
            if self.train_set.is_unk_user(user_id) or self.train_set.is_unk_item(
                    item_id
            ):
                raise ScoreException(
                    "Can't make score prediction for (user_id=%d, item_id=%d)"
                    % (user_id, item_id)
                )

            user_pred = self.V[:, item_id].dot(self.U[:, user_id])
            return user_pred

if __name__ == "__main__":
    from cornac.data import GraphModality
    from cornac.eval_methods import CrossValidation
    from cornac.experiment import Experiment
    from cornac import metrics
    from cornac.datasets import filmtrust

    ratings = filmtrust.load_feedback()
    trust = filmtrust.load_trust()

    user_graph_modality = GraphModality(data=trust)

    cv_split = CrossValidation(
        data=ratings,
        n_folds=5,
        exclude_unknowns=True,
        rating_threshold=0.0,
        user_graph=user_graph_modality,
        verbose=True,
    )

    # Instantiate SoRec model
    test = Test()

    # Evaluation metrics
    ndcg = metrics.NDCG(k=-1)
    rmse = metrics.RMSE()
    rec = metrics.Recall(k=20)
    pre = metrics.Precision(k=20)

    # Put everything together into an experiment and run it
    Experiment(
        eval_method=cv_split, models=[test], metrics=[rmse, ndcg, pre, rec]
    ).run()

Expected behavior (i.e. solution)

If this is a bug, please fix it, if not, please tell me how to modify my code to achieve this purpose

Other Comments

zwhe99 commented 4 years ago

I found that removing the "if self .__ matrix is None:" of the GraphModality.matrix can solve this error. This resulted in user_graph.matrix not being updated correctly in the next fold.

saghiles commented 4 years ago

Thanks for pointing out this issue. Indeed, the matrix is not being updated correctly across folds. Ignoring the "if self .__ matrix is None:" statement fixes the problem. However, this is not an efficient solution, as the matrix will be build at every call. We are working on a more efficient solution. Meanwhile, we suggest removing the if statement or using RatioSplit instead.

Thank you!

zwhe99 commented 4 years ago

Maybe you can construct the matrix at the end of GraphModality._build_triplet().

saghiles commented 4 years ago

The problem with the adjacency matrix in the graph modality is now solved (PR 333), so I will close this issue.