THUwangcy / ReChorus

“Chorus” of recommendation models: a light and flexible PyTorch framework for Top-K recommendation.
MIT License
545 stars 91 forks source link

Dataset Preprocessing: Generating leave_df in amazon.ipynb #9

Closed MinSeok-Pons-Kim closed 3 years ago

MinSeok-Pons-Kim commented 3 years ago

Hi, thanks for the great code! I would like to ask something about the preprocessing code in Amazon.ipynb, where leave_df is made which I think it shouldn't be. ` leave_df = out_df.groupby('user_id').head(1)

data_df = out_df.drop(leave_df.index) ` If it is just to leave the last two items out for dev/test data, why is it included here? I think this would harm getting negative samples since the items in leave_df are excluded during generating test_df and dev_df.

Again, thanks for the great code!

MinSeok-Pons-Kim commented 3 years ago

I guess the intention of doing like this would be to keep users with only one items inside the training data although there is a risk of including positive items as negative items.

THUwangcy commented 3 years ago

Yes, you are right. This is just to ensure all the dev/test samples have at least one history interaction. Besides, cause we construct clicked_item_set before cutting off the leave_df, positive items have no chance to be regarded as negative items.

MinSeok-Pons-Kim commented 3 years ago

Awesome! I can see tremendous amount of your effort and consideration in the code :) Thanks for the reply.