facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.7k stars 824 forks source link

test.txt problem #388

Open janghobaek2125 opened 1 month ago

janghobaek2125 commented 1 month ago

I am training a model using the Kaggle dataset. The Kaggle dataset consists of train.txt and test.txt files.

The train.txt used during training is well preprocessed, and the training is completed successfully.

However, it seems that the dataset test.txt used for inference is not being properly preprocessed.

"What seems to be the problem?"

python data_utils.py --raw-data-file=/data/janghobaek/test.txt

janghobaek2125 commented 1 month ago

Traceback (most recent call last): File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1912, in run() File "/data/janghobaek/jangho/dlrm/dlrm_s_pytorch.py", line 1108, in run train_data, train_ld, test_data, test_ld = dp.make_criteo_data_and_loaders(args) File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 520, in make_criteo_data_and_loaders train_data = CriteoDataset( File "/data/janghobaek/jangho/dlrm/dlrm_data_pytorch.py", line 109, in init file = data_utils.getCriteoAdData( File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1138, in getCriteoAdData total_per_file[i] = process_one_file( File "/data/janghobaek/jangho/dlrm/data_utils.py", line 1015, in process_one_file X_int[i] = np.array(line[1:14], dtype=np.int32) ValueError: invalid literal for int() with base 10: '5a9ed9b0'