HKUDS / MMSSL

[WWW'2023] "MMSSL: Multi-Modal Self-Supervised Learning for Recommendation"
https://arxiv.org/abs/2302.10632
155 stars 20 forks source link

Raw dataset about Tiktok #2

Closed xinzhou-ai closed 1 year ago

xinzhou-ai commented 1 year ago

Thanks for sharing the code for your great work. I've observed that you have provided the pre-processed dataset about Tiktok, which seems different with the one used in DualGNN.

Recall@20, MMSSL: 0.0921 < Recall@10 DualGNN: 0.1318

As this dataset is used in a messy manner, may you also provide the raw dataset about TikTok and how you pre-proceed raw dataset into multimodal features? You efforts are great appreciated. Thanks.

Rilke123 commented 1 year ago

Thanks for sharing the dataset of your great work. I wonder if you could provide the interactions in Tiktok dataset. Thanks. email : whc251013@163.com

weiwei1206 commented 1 year ago

We appreciate your interest in our work and we are overjoyed. The website for the tiktok dataset has been shut down, and we are unable to acquire the original dataset. We have instead given a larger preprocessed version of tiktok(https://pan.baidu.com/s/10hhz1y7__XixcLo7MCTByQ?pwd=wiak. code:wiak).

The dataset preprocessing took a lot of our work. It is very important to us that you cite our work if you use our pre-processed dataset (for instance, by mentioning the source in the dataset introduction section).

Your interest in our work is much appreciated and we wish you good luck with your future scientific endeavors.

enoche commented 1 year ago

We appreciate your interest in our work and we are overjoyed. The website for the tiktok dataset has been shut down, and we are unable to acquire the original dataset. We have instead given a larger preprocessed version of tiktok(https://pan.baidu.com/s/10hhz1y7__XixcLo7MCTByQ?pwd=wiak. code:wiak).

The dataset preprocessing took a lot of our work. It is very important to us that you cite our work if you use our pre-processed dataset (for instance, by mentioning the source in the dataset introduction section).

Your interest in our work is much appreciated and we wish you good luck with your future scientific endeavors.

Hi, weiwei, Thanks for your effort in providing the preprocessed datasets. May we have a copy of the raw dataset before your preprocessing, as we would try other multimodal feature embedding models.

Also, can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper. Thanks.

Is your dataset the same as that used in DualGNN (TMM)?

weiwei1206 commented 1 year ago

We appreciate your interest in our work and we are overjoyed. The website for the tiktok dataset has been shut down, and we are unable to acquire the original dataset. We have instead given a larger preprocessed version of tiktok(https://pan.baidu.com/s/10hhz1y7__XixcLo7MCTByQ?pwd=wiak. code:wiak). The dataset preprocessing took a lot of our work. It is very important to us that you cite our work if you use our pre-processed dataset (for instance, by mentioning the source in the dataset introduction section). Your interest in our work is much appreciated and we wish you good luck with your future scientific endeavors.

Hi, weiwei, Thanks for your effort in providing the preprocessed datasets. May we have a copy of the raw dataset before your preprocessing, as we would try other multimodal feature embedding models.

Also, can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper. Thanks.

Is your dataset the same as that used in DualGNN (TMM)?

Sorry, we didn't find DualGNN before. We have just retrieved DualGNN, but based on the resources found, DualGNN does not provide any available datasets. It provides a version of the toy dataset.

weiwei1206 commented 1 year ago

Thanks for sharing the code for your great work. I've observed that you have provided the pre-processed dataset about Tiktok, which seems different with the one used in DualGNN.

Recall@20, MMSSL: 0.0921 < Recall@10 DualGNN: 0.1318

As this dataset is used in a messy manner, may you also provide the raw dataset about TikTok and how you pre-proceed raw dataset into multimodal features? You efforts are great appreciated. Thanks.

We have provided all the datasets for this work in readme.md, as well as a larger version of tiktok. We have followed the LATTICE framework because it provides the available dataset to support the code. The testing of the LATTICE framework is all item rank, which may be the reason for the different results. In addition, different test sets bring significant differences in results, which may also be a factor to consider. Besides, we have just retrieved DualGNN, but based on the resources found, DualGNN does not provide any available datasets. It provides a version of the toy dataset as MMGCN, GRCN.

enoche commented 1 year ago

We appreciate your interest in our work and we are overjoyed. The website for the tiktok dataset has been shut down, and we are unable to acquire the original dataset. We have instead given a larger preprocessed version of tiktok(https://pan.baidu.com/s/10hhz1y7__XixcLo7MCTByQ?pwd=wiak. code:wiak). The dataset preprocessing took a lot of our work. It is very important to us that you cite our work if you use our pre-processed dataset (for instance, by mentioning the source in the dataset introduction section). Your interest in our work is much appreciated and we wish you good luck with your future scientific endeavors.

Hi, weiwei, Thanks for your effort in providing the preprocessed datasets. May we have a copy of the raw dataset before your preprocessing, as we would try other multimodal feature embedding models. Also, can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper. Thanks. Is your dataset the same as that used in DualGNN (TMM)?

Sorry, we didn't find DualGNN before. We have just retrieved DualGNN, but based on the resources found, DualGNN does not provide any available datasets. It provides a version of the toy dataset.

Noted with thanks.

xinzhou-ai commented 1 year ago

Thanks for sharing the code for your great work. I've observed that you have provided the pre-processed dataset about Tiktok, which seems different with the one used in DualGNN. Recall@20, MMSSL: 0.0921 < Recall@10 DualGNN: 0.1318 As this dataset is used in a messy manner, may you also provide the raw dataset about TikTok and how you pre-proceed raw dataset into multimodal features? You efforts are great appreciated. Thanks.

We have provided all the datasets for this work in readme.md, as well as a larger version of tiktok. We have followed the LATTICE framework because it provides the available dataset to support the code. The testing of the LATTICE framework is all item rank, which may be the reason for the different results. In addition, different test sets bring significant differences in results, which may also be a factor to consider. Besides, we have just retrieved DualGNN, but based on the resources found, DualGNN does not provide any available datasets. It provides a version of the toy dataset as MMGCN, GRCN.

Thanks for your reply. I agree that different settings result in different performance. As the raw datasets is not available, seems impossible to reproduce their (MMGCN, DualGNN) performance from scratch.