caserec / CaseRecommender

Case Recommender: A Flexible and Extensible Python Framework for Recommender Systems
MIT License
471 stars 91 forks source link

How to evaluate CoRec enriched rating matrix vs original one? #34

Closed liv1n9 closed 4 years ago

liv1n9 commented 5 years ago

Hi, I have a question about evaluating CoRec enriched rating matrix versus the original matrix. I'm doing k-folds cross validation, which divide the dataset into k folds (k = 10). For each fold, take it as test set and the others folds as train set, and after data preprocessing i train the original data with User-KNN & Item-KNN to get the predict sets (which will be combined with the test set to calculate RMSE).

Then i use CoRec to generate two enriched labeled sets. The problem is if there is a pair(u, i) which belongs to test set of the original data, but is labeled in the enriched labeled sets, then when we predict using new labeled sets, this pair won't be predicted. It can still be predicted if we set the test set parameter of recommender the same as original test set, but in that case, this pair is predicted even when it already had been assigned with a rating, which make me confuse. Is that okay if we predict a labeled example?

If it's not okay, then if we don't use the orginal test set when learning model with enriched labeled set, then the intersection of predict set and test set (which will be used for calculating RMSE) of enriched model may be different from the one of the original model, which make the evaluation less reliable. For example, if we set the train set parameter as the enriched labeled set, and leave the train set empty, then the predict set will not contain the pair (u,i) mentioned above because it doesn't belong to the test set or unlabeled set of enriched model, while it belongs to both prediction set and test set of original model.

Thank you for reading, correct me if i'm wrong.