facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.71k stars 825 forks source link

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

Open JerryQGui opened 7 months ago

JerryQGui commented 7 months ago

I have downloaded and unzipped the 4GB dataset. It consists of 3 files, readme.txt, train.txt, and test.txt. It is stored in a folder called dataset, which is a sibling folder to my cloned dlrm folder.

I believe this is the command needed to preprocess, as implied in the README

python dlrm_s_pytorch.py --raw-data-file=../dataset/train.txt

however, the output of this is

world size: 1, current rank: 0, local rank: 0 Using CPU... time/loss/accuracy (if enabled): Finished training it 1/1 of epoch 0, -1.00 ms/it, loss 0.083850

I have seen another issue, #274 where someone posted lines that should happen when preprocessing occurs. Reading raw data=/my_raw_data_path/train.txt

Additionally, there is no .npz file(s) in my input directory.

Is it because there are some other required flags?