HKUDS / DCRec

[WWW'2023] "DCRec: Debiased Contrastive Learning for Sequential Recommendation"
https://arxiv.org/abs/2303.11780
MIT License
54 stars 6 forks source link

Problem of Evaluation Protocols #8

Closed Ocxs closed 9 months ago

Ocxs commented 9 months ago

In Section 3.1.2 of your paper, you mentioned, "We follow [11, 20, 30] to adopt the leave-one-out strategy for model evaluation. Specifically, we treat the last interaction of each user as testing data, and designate the previous one as validation data. " However, the evaluation protocol employed in your released code does not correspond to the leave-one-out strategy. The log information for DCRec is as follows:

eval_args = {'mode': 'pop100', 'order': 'TO', 'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user'}

In RecBole, the 'RS' parameter divides the dataset into train, val, and test sections based on ratios.

image
yuh-yang commented 9 months ago

Hi,

The automatic splitting function provided by RecBole would be deactivated if preprocessed data files are available. Please refer to run_DCRec.py line 305-320.

Ocxs commented 9 months ago
image

Thank you for your response. Thus, the test set utilizes the leave-one-out strategy, yet when dividing the validation set, the split_by_ratio method is employed. Default parameter settings dictate that 90% of the dataset forms the training set, with the remaining 10% functioning as the validation set. Does my comprehension hold true?

Ocxs commented 9 months ago
image

By looking at this code piece, it seems like if I set args.validation to True, the validation set is used as the test set. And I haven't seen any actual test set data being used for evaluation before the code is finished running. The results on the validation set seem pretty close to the ones mentioned in your paper. Could this be a problem?

yuh-yang commented 9 months ago

Hi,

validation is used for selecting best checkpoint. You can then test your best ckpt with the true test set to have finalized performance results.

yuh-yang commented 9 months ago

Closed due to inactiveness. Reopen it if you have further questions.