HKUDS / DCRec

[WWW'2023] "DCRec: Debiased Contrastive Learning for Sequential Recommendation"
https://arxiv.org/abs/2303.11780
MIT License
54 stars 6 forks source link

Doubts about how the data partitioning method #4

Closed ZuGeYunQian closed 1 year ago

ZuGeYunQian commented 1 year ago

Thank you very much for your paper. It has been very inspiring to me. I sent you an email yesterday, but I am concerned that you may not frequently check your inbox. Therefore, I am reaching out again on GitHub to ask my question. The paper does not provide detailed information about the data partitioning method, such as the ML-20M dataset, which originally contains 20 million data points, but your experiments only utilize a small portion of it.Therefore, I would like to request your assistance. May I ask if you can share the raw dataset used in your experiments in CSV format or inform me about the data partitioning method, solely for academic exchange purposes?

yuh-yang commented 1 year ago

Hi,

It's from RecBole's released datasets. https://recbole.io/dataset_list.html

ZuGeYunQian commented 1 year ago

Thank you very much for providing the website. However, I tried both the Google Driver and Github links, and I could only find the original version of ML-20M dataset. I couldn't find the version with 193,452 user interactions that were used in the paper. Could you please let me know which part of the ML-20M dataset the paper used?

截屏2023-08-02 21 12 53
yuh-yang commented 1 year ago

Oh I got you,

I upload my preprocess script here: https://www.dropbox.com/s/nu711v3rluqx2k7/preprocess.ipynb?dl=0

which includes some filtering ops

ZuGeYunQian commented 1 year ago

Thank you so much for providing the data I needed! Your generous help will be of great assistance to my research. I truly appreciate your support and kindness!