Section "Training factorization machines with pipe mode" in ch. 9 demonstrates how to train a recommender with pipe mode.
The train-test split is completely random: the units to randomize are the ratings, not the users. This means that most users will contribute to both the training and the test subsets.
Isn't it best practice to split on users instead? So that users are either completely in the training subset or completely in the test test.
This is the relevant code snippet:
X, Y = loadDataset('ml-25m/ratings.csv', num_ratings, num_features)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.05, random_state=59)
Section "Training factorization machines with pipe mode" in ch. 9 demonstrates how to train a recommender with pipe mode.
The train-test split is completely random: the units to randomize are the ratings, not the users. This means that most users will contribute to both the training and the test subsets.
Isn't it best practice to split on users instead? So that users are either completely in the training subset or completely in the test test.
This is the relevant code snippet: