Closed karthikraja95 closed 5 years ago
could you please share the script with which you converted your validation dataframes to csr matrix
X_test is the test dataset ( contains 20% of the whole data)
val_train, val_test = train_test_split(X_test, test_size=0.50, random_state=42)
I used the following code to create validation csr matrix
val_input_matrix, val_input_item_id_map, val_input_user_id_map = dataframe_to_csr_matrix(val_train, user_col='userID', item_col='itemID', inter_col='rating')
val_test_matrix, val_test_item_id_map, val_test_user_id_map = dataframe_to_csr_matrix(val_test, user_col='userID', item_col='itemID', inter_col='rating')
Then I saved them as npz files with the following code
save_npz('val_input.npz', matrix=val_input_matrix)
save_npz('val_test.npz', matrix=val_test_matrix)
After that, I used the script which was in the previous comment and that gives the resulting traceback.
when converting your validation dataframe, you have to pass the item_id_map
returned when converting your training dataframe to dataframe_to_csr_matrix
, so that your validation data map to the same items in the training data (same for user_id_map
, but not necessary in your case).
val_input_matrix, val_input_item_id_map, val_input_user_id_map = dataframe_to_csr_matrix(val_train, user_col='userID', item_col='itemID', inter_col='rating', item_id_map=item_id_map, user_id_map=user_id_map )
val_test_matrix, val_test_item_id_map, val_test_user_id_map = dataframe_to_csr_matrix(val_test, user_col='userID', item_col='itemID', inter_col='rating', item_id_map=item_id_map, user_id_map=user_id_map )
_I tried the above code, Now I got a negative column found error
Heres the traceback:_
ValueError Traceback (most recent call last)
it seems you have items in the validation dataset that are missing in the training set (replicated the exception on my side).
one more thing to note, in your case if the training dataset users are not the same one in the validation dataset, then no need to pass the user_id_map
returned when converting the training set. however, you should pass the user_id_map
returned when converting the input split of the validation set to the dataframe_to_csr_matrix
when converting the test split of the validation set.
Okay let me try that and get back to you
Hi @amoussawi
Now I have the same users in the train and test set and I followed the procedure you mentioned
Here's the code
X_train = pd.read_pickle('train_stratified_split.pkl') X_test = pd.read_pickle('test_stratified_split.pkl')
val_train, val_test = train_test_split(X_test, test_size=0.50, random_state=42)
train_matrix, item_id_map, user_id_map = dataframe_to_csr_matrix(X_train, user_col='userID', item_col='itemID', inter_col='rating') val_input_matrix, val_input_item_id_map, val_input_user_id_map = dataframe_to_csr_matrix(val_train, user_col='userID', item_col='itemID', inter_col='rating', user_id_map=user_id_map )
val_test_matrix, val_test_item_id_map, val_test_user_id_map = dataframe_to_csr_matrix(val_test, user_col='userID', item_col='itemID', inter_col='rating', user_id_map=user_id_map )
save_npz('val_input.npz', matrix=val_input_matrix)
save_npz('val_test.npz', matrix=val_test_matrix)
train_matrix_1 = sparse.load_npz('train.npz') val_input_matrix_1 = sparse.load_npz('val_input.npz') val_test_matrix_1 = sparse.load_npz('val_test.npz')
train_dataset_1 = RecommendationDataset(train_matrix_1)
val_dataset_1 = RecommendationDataset(val_input_matrix_1,val_test_matrix_1)
model = DynamicAutoencoder() model_file = 'autoencoder_epoch_20.model'
testing = DynamicAutoencoder()
metrics = [NDCG(k=10)]
test_recoder = Recoder(model=model, use_cuda=False) test_recoder.init_from_model_file(model_file)
NO ERRORS TILL THE ABOVE PART - EVERYTHING IS GOOD
WHEN I RUN THE FOLLOWING CODE IT THROWS ERRORS I TRIED TO DEBUG IT BUT I CANT
num_recommendations = 10
test_recoder.evaluate(eval_dataset=val_dataset_1, num_recommendations=num_recommendations, metrics=metrics, batch_size=100)
HERES THE TRACEBACK
RuntimeError Traceback (most recent call last)
you are passing user_id_map
instead of item_id_map
.
here's how it should be
train_matrix, item_id_map, user_id_map = dataframe_to_csr_matrix(X_train,
user_col='userID',
item_col='itemID',
inter_col='rating')
val_input_matrix, val_input_item_id_map, val_input_user_id_map = dataframe_to_csr_matrix(val_train,
user_col='userID',
item_col='itemID',
inter_col='rating',
item_id_map=item_id_map
)
val_test_matrix, val_test_item_id_map, val_test_user_id_map = dataframe_to_csr_matrix(val_test,
user_col='userID',
item_col='itemID',
inter_col='rating',
item_id_map=item_id_map,
user_id_map=val_input_user_id_map
)
Okay, now I have to split the dataset based on items not based on users. After that, I will try the above code and get back to you. Thanks @amoussawi
Closing. Feel free to reopen if you are still facing any issue.
Hi @amoussawi
I split up my dataset set into 80% training and 20% testing, and I split up the test data set into two sets of 50%. One split will be used as input to the model to generate predictions, and the other is which the model predictions will be evaluated on.
Then I passed those two testing split into dataframe_to_csr_matrix to get csr matrix and saved them as npz files.
I trained the model using 80% of the train data and saved the model as 'autoencoder_epoch_20.model'. I used CPUs not GPUs.
Then I followed the following lines:
val_input_matrix_1 = sparse.load_npz('val_input.npz') val_test_matrix_1 = sparse.load_npz('val_test.npz')
val_dataset_1 = RecommendationDataset(val_input_matrix_1,val_test_matrix_1)
from recoder.metrics import AveragePrecision, Recall, NDCG
model_file = 'autoencoder_epoch_20.model'
testing = DynamicAutoencoder()
metrics = [NDCG(k=10)]
test_recoder = Recoder(model=model, use_cuda=False) test_recoder.init_from_model_file(model_file)
num_recommendations = 10
test_recoder.evaluate(eval_dataset=val_dataset_1, num_recommendations=num_recommendations, metrics=metrics)
The program worked until the loading the saved model part, and then it fails during the evaluation phase. Can you help me solve it?
Any help would be appreciated.
Heres the Traceback:
RuntimeError Traceback (most recent call last)