Data splitting - Githubissues

MTandHJ commented 8 months ago

Hi,

Firstly thanks for your work for multi-modal recommendation. I downloaded the Baby dataset and found that the trainsize:validsize:testsize is 118551:20559:21682, which appears not the 7:1:2 ratios suggested in the 0rating2inter.ipynb.

Thanks!

enoche commented 8 months ago

@MTandHJ Thanks for your feedback. The splitting ratio should be 8:1:1, I've updated in the script. Thanks a lot.

MTandHJ commented 8 months ago

Hi,

Thanks for your quick reply! I split the data using a ratio of 8:1:1 (which yields 128595,15983,16214) but still cannot reproduce the results.

Thanks!

enoche commented 8 months ago

Hi, @MTandHJ, appreciate your feedback.

I split the data using a ratio of 8:1:1 (which yields 128595,15983,16214) but still cannot reproduce the results.

The answer is that in https://github.com/enoche/MMRec/blob/master/preprocessing/1splitting.ipynb, In [6]:

    if n_items < 10:
        tmp_ls = [0] * (n_items - 2) + [1] + [2]

Some users may have less than 10 interactions, to ensure each user has one Validation/Test instance, we force the splitting to reach this.

The splitting in https://github.com/enoche/MMRec/blob/master/preprocessing/0rating2inter.ipynb has no impact on the final data. Thanks.

MTandHJ commented 8 months ago

Thanks!

enoche / MMRec

Data splitting #26