Closed guvcolie closed 4 years ago
Please take a look at the README "Benchmarking" section, where in point 2. we discuss the interface to Criteo Kaggle DAC dataset.
You will notice that we use a shell script ./bench/dlrm_s_criteo_kaggle.sh
, which specifies these parameters for you. You can also pass additional parameters in quotes afterwards, such as ./bench/dlrm_s_criteo_kaggle.sh "--test-freq=1024 --memory-map"
Thank you! If I want "In experiments, typically the 7th day is split into a validation and test set while the first 6 days are used as the training set. ", I must set the hyper-param randomize="day", right?
I recommend leaving the default --data-randomize="total", you will still get the above split.
OK, got it! thank you!
I download the raw kaggle criteo dataset, there's a readme.txt, train.txt and test.txt.