Dataset split - Githubissues

YiyanXu / DiffRec

Diffusion Recommender Model

168 stars 23 forks source link

Dataset split #4

Closed gorgeousdays closed 1 year ago

gorgeousdays commented 1 year ago

Hi,

I read in the paper that the sorted interactions are be splited into training, validation, and testing sets with the ratio of 7:1:2. But the valid dataset in this repository is clearly larger than the test dataset, more like 7:2:1. Is there some problem here?

Best.

YiyanXu commented 1 year ago

Do you mean the datasets under noisy training? We split the sorted interactions into training, validation, and testing sets with the ratio of 7:1:2 for clean setting, while noisy setting keeps the same testing set of clean setting, but adds some noisy interactions into training set and validation set. Therefore, the validation set is larger than the testing set under noisy setting.

gorgeousdays commented 1 year ago

Thanks for your reply. Specifically, I have extracted the ml-1m clean dataset, which includes the train, valid, and test files. I checked the data and found that the three files each contain 403277:110722:57532 interactions. This seems to be 7:2:1. Another question is whether the experimental data in the paper is set tst_ w_val for True.

YiyanXu commented 1 year ago

You're welcome :) All the results (including our methods and other baselines) reported in the paper were obtained by tst_w_val=False.

akajinchen commented 2 months ago

You're welcome :) All the results (including our methods and other baselines) reported in the paper were obtained by tst_w_val=False.

您好，我也注意到数据集的划分为并不是文章中所说的7：1：2，想知道文章中的结果是根据公布的数据集划分跑出来的吗，还是说公布的数据集有错把test和valid弄反了呢？