Closed ssmmyy closed 5 years ago
Hello ssmmyy,
Thanks for your interest on our paper. When we submit the initial paper, we only sort the DIGENTICA dataset by eventdate
in train-item-views.csv
, since we directly make use of public code made by NARM’s authors. You may check the previous versions in arXiv. However, someone notifies us that user clicks within every user session in the original dataset may be still out of order, and we assume the relative order of user clicks are specified in the timeframe
field. We confirm with NARM's author and rerun the baseline performance accordingly. Since the dataset is published years ago, we cannot find any other documentation to explain the meaning of each field.
I in person do agree with your opinion and I think it can explain why after changing the preprocessing code, both the performance indicators drop dramatically. Most importantly, we think we care whether all baselines conform with the same configurations. If it is the case, then it is fair to compare SR-GNN with other methods.
I see that you have revised the results of the experiment in the latest version. The reason is that the user clicks within every user session in the DIGENTICA dataset are still out of order. But actually user clicks within every user session in the DIGENTICA dataset is ordered, the timeframe is the time spent on browsing. Please see the train-item-views.csv file. The data in timeframe column is in range(1,000, 2,000,000). It's not possible to say click time because if the dataset use the Unix timestamp, the range for a day should be (0,86,400), otherwise it should be (0,86,400,000). And if the order after changing is correct, why do the results of all the experiments go down?