facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.72k stars 825 forks source link

How should I set the parameters of CriteoDataset()? #78

Closed guvcolie closed 4 years ago

guvcolie commented 4 years ago

I download the raw kaggle criteo dataset, there's a readme.txt, train.txt and test.txt.

  1. How can I get the training set, val set and test set by your dlrm_data_pytorch.py?
  2. And how do I set the raw_path, pro_data and memory_map hyper-params when training? Thank you!
mnaumovfb commented 4 years ago

Please take a look at the README "Benchmarking" section, where in point 2. we discuss the interface to Criteo Kaggle DAC dataset.

You will notice that we use a shell script ./bench/dlrm_s_criteo_kaggle.sh, which specifies these parameters for you. You can also pass additional parameters in quotes afterwards, such as ./bench/dlrm_s_criteo_kaggle.sh "--test-freq=1024 --memory-map"

guvcolie commented 4 years ago

Thank you! If I want "In experiments, typically the 7th day is split into a validation and test set while the first 6 days are used as the training set. ", I must set the hyper-param randomize="day", right?

mnaumovfb commented 4 years ago

I recommend leaving the default --data-randomize="total", you will still get the above split.

guvcolie commented 4 years ago

OK, got it! thank you!