how to train dlrm with multi-gpu

facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)

MIT License

3.71k stars 825 forks source link

How many GPUs on the machine do you have? Can you try the command from the readme (Benchmarking, Section 5 "The code now supports synchronous distributed training ..." and share the error message? # for single node 8 gpus and nccl as backend on randomly generated dataset: python -m torch.distributed.launch --nproc_per_node=8 dlrm_s_pytorch.py --arch-embedding-size="80000-80000-80000-80000-80000-80000-80000-80000" --arch-sparse-feature-size=64 --arch-mlp-bot="128-128-128-128" --arch-mlp-top="512-512-512-256-1" --max-ind-range=40000000 --data-generation=random --loss-function=bce --round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2 --print-time --test-freq=2 --test-mini-batch-size=2048 --memory-map --use-gpu --num-batches=100 --dist-backend=nccl

facebookresearch / dlrm

how to train dlrm with multi-gpu #354