Closed MinSeok-Pons-Kim closed 3 years ago
I guess the intention of doing like this would be to keep users with only one items inside the training data although there is a risk of including positive items as negative items.
Yes, you are right. This is just to ensure all the dev/test samples have at least one history interaction. Besides, cause we construct clicked_item_set
before cutting off the leave_df, positive items have no chance to be regarded as negative items.
Awesome! I can see tremendous amount of your effort and consideration in the code :) Thanks for the reply.
Hi, thanks for the great code! I would like to ask something about the preprocessing code in Amazon.ipynb, where leave_df is made which I think it shouldn't be. ` leave_df = out_df.groupby('user_id').head(1)
data_df = out_df.drop(leave_df.index) ` If it is just to leave the last two items out for dev/test data, why is it included here? I think this would harm getting negative samples since the items in leave_df are excluded during generating test_df and dev_df.
Again, thanks for the great code!