facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.71k stars 825 forks source link

Bind to cores for inference test #379

Closed Qinghe12 closed 5 months ago

Qinghe12 commented 5 months ago

When i run a test with cmd as following: cmd : python3 dlrm_s_pytorch.py --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --processed-data-file=input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --print-freq=1024 --print-time --inference-only --mini-batch-size=128 --num-batches=4096

It runs normally and end up in 30 sec. ( other cores are offline except core 0-3)

But When i run a test with cmd as following: cmd: taskset -c 0-3 python3 dlrm_s_pytorch.py --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=input/train.txt --processed-data-file=input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --print-freq=1024 --print-time --num-workers=0 --inference-only --mini-batch-size=128 --num-batches=4096

it runs abnormall,It cannot end up in 30 minutes

What happend ??