I am able to run python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --use-gpu. But when I am trying to run dlrm_s_pytorch.py on single node multiple GPUs with nccl.
Here is the command I used:
pytorch2.0.0/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Unable to import onnx. No module named 'onnx'
usage: dlrm_s_pytorch.py [-h] [--arch-sparse-feature-size ARCH_SPARSE_FEATURE_SIZE] [--arch-embedding-size ARCH_EMBEDDING_SIZE] [--arch-mlp-bot ARCH_MLP_BOT] [--arch-mlp-top ARCH_MLP_TOP] [--arch-interaction-op {dot,cat}]
[--arch-interaction-itself] [--weighted-pooling WEIGHTED_POOLING] [--md-flag] [--md-threshold MD_THRESHOLD] [--md-temperature MD_TEMPERATURE] [--md-round-dims] [--qr-flag] [--qr-threshold QR_THRESHOLD]
[--qr-operation QR_OPERATION] [--qr-collisions QR_COLLISIONS] [--activation-function ACTIVATION_FUNCTION] [--loss-function LOSS_FUNCTION] [--loss-weights LOSS_WEIGHTS] [--loss-threshold LOSS_THRESHOLD]
[--round-targets ROUND_TARGETS] [--data-size DATA_SIZE] [--num-batches NUM_BATCHES] [--data-generation DATA_GENERATION] [--rand-data-dist RAND_DATA_DIST] [--rand-data-min RAND_DATA_MIN]
[--rand-data-max RAND_DATA_MAX] [--rand-data-mu RAND_DATA_MU] [--rand-data-sigma RAND_DATA_SIGMA] [--data-trace-file DATA_TRACE_FILE] [--data-set DATA_SET] [--raw-data-file RAW_DATA_FILE]
[--processed-data-file PROCESSED_DATA_FILE] [--data-randomize DATA_RANDOMIZE] [--data-trace-enable-padding DATA_TRACE_ENABLE_PADDING] [--max-ind-range MAX_IND_RANGE]
[--data-sub-sample-rate DATA_SUB_SAMPLE_RATE] [--num-indices-per-lookup NUM_INDICES_PER_LOOKUP] [--num-indices-per-lookup-fixed NUM_INDICES_PER_LOOKUP_FIXED] [--num-workers NUM_WORKERS] [--memory-map]
[--mini-batch-size MINI_BATCH_SIZE] [--nepochs NEPOCHS] [--learning-rate LEARNING_RATE] [--print-precision PRINT_PRECISION] [--numpy-rand-seed NUMPY_RAND_SEED] [--sync-dense-params SYNC_DENSE_PARAMS]
[--optimizer OPTIMIZER] [--dataset-multiprocessing] [--inference-only] [--quantize-mlp-with-bit QUANTIZE_MLP_WITH_BIT] [--quantize-emb-with-bit QUANTIZE_EMB_WITH_BIT] [--save-onnx] [--use-gpu]
[--local_rank LOCAL_RANK] [--dist-backend DIST_BACKEND] [--print-freq PRINT_FREQ] [--test-freq TEST_FREQ] [--test-mini-batch-size TEST_MINI_BATCH_SIZE] [--test-num-workers TEST_NUM_WORKERS] [--print-time]
[--print-wall-time] [--debug-mode] [--enable-profiling] [--plot-compute-graph] [--tensor-board-filename TENSOR_BOARD_FILENAME] [--save-model SAVE_MODEL] [--load-model LOAD_MODEL] [--mlperf-logging]
[--mlperf-acc-threshold MLPERF_ACC_THRESHOLD] [--mlperf-auc-threshold MLPERF_AUC_THRESHOLD] [--mlperf-bin-loader] [--mlperf-bin-shuffle] [--mlperf-grad-accum-iter MLPERF_GRAD_ACCUM_ITER]
[--lr-num-warmup-steps LR_NUM_WARMUP_STEPS] [--lr-decay-start-step LR_DECAY_START_STEP] [--lr-num-decay-steps LR_NUM_DECAY_STEPS]
dlrm_s_pytorch.py: error: unrecognized arguments: --local-rank=1
Unable to import onnx. No module named 'onnx'
usage: dlrm_s_pytorch.py [-h] [--arch-sparse-feature-size ARCH_SPARSE_FEATURE_SIZE] [--arch-embedding-size ARCH_EMBEDDING_SIZE] [--arch-mlp-bot ARCH_MLP_BOT] [--arch-mlp-top ARCH_MLP_TOP] [--arch-interaction-op {dot,cat}]
[--arch-interaction-itself] [--weighted-pooling WEIGHTED_POOLING] [--md-flag] [--md-threshold MD_THRESHOLD] [--md-temperature MD_TEMPERATURE] [--md-round-dims] [--qr-flag] [--qr-threshold QR_THRESHOLD]
[--qr-operation QR_OPERATION] [--qr-collisions QR_COLLISIONS] [--activation-function ACTIVATION_FUNCTION] [--loss-function LOSS_FUNCTION] [--loss-weights LOSS_WEIGHTS] [--loss-threshold LOSS_THRESHOLD]
[--round-targets ROUND_TARGETS] [--data-size DATA_SIZE] [--num-batches NUM_BATCHES] [--data-generation DATA_GENERATION] [--rand-data-dist RAND_DATA_DIST] [--rand-data-min RAND_DATA_MIN]
[--rand-data-max RAND_DATA_MAX] [--rand-data-mu RAND_DATA_MU] [--rand-data-sigma RAND_DATA_SIGMA] [--data-trace-file DATA_TRACE_FILE] [--data-set DATA_SET] [--raw-data-file RAW_DATA_FILE]
[--processed-data-file PROCESSED_DATA_FILE] [--data-randomize DATA_RANDOMIZE] [--data-trace-enable-padding DATA_TRACE_ENABLE_PADDING] [--max-ind-range MAX_IND_RANGE]
[--data-sub-sample-rate DATA_SUB_SAMPLE_RATE] [--num-indices-per-lookup NUM_INDICES_PER_LOOKUP] [--num-indices-per-lookup-fixed NUM_INDICES_PER_LOOKUP_FIXED] [--num-workers NUM_WORKERS] [--memory-map]
[--mini-batch-size MINI_BATCH_SIZE] [--nepochs NEPOCHS] [--learning-rate LEARNING_RATE] [--print-precision PRINT_PRECISION] [--numpy-rand-seed NUMPY_RAND_SEED] [--sync-dense-params SYNC_DENSE_PARAMS]
[--optimizer OPTIMIZER] [--dataset-multiprocessing] [--inference-only] [--quantize-mlp-with-bit QUANTIZE_MLP_WITH_BIT] [--quantize-emb-with-bit QUANTIZE_EMB_WITH_BIT] [--save-onnx] [--use-gpu]
[--local_rank LOCAL_RANK] [--dist-backend DIST_BACKEND] [--print-freq PRINT_FREQ] [--test-freq TEST_FREQ] [--test-mini-batch-size TEST_MINI_BATCH_SIZE] [--test-num-workers TEST_NUM_WORKERS] [--print-time]
[--print-wall-time] [--debug-mode] [--enable-profiling] [--plot-compute-graph] [--tensor-board-filename TENSOR_BOARD_FILENAME] [--save-model SAVE_MODEL] [--load-model LOAD_MODEL] [--mlperf-logging]
[--mlperf-acc-threshold MLPERF_ACC_THRESHOLD] [--mlperf-auc-threshold MLPERF_AUC_THRESHOLD] [--mlperf-bin-loader] [--mlperf-bin-shuffle] [--mlperf-grad-accum-iter MLPERF_GRAD_ACCUM_ITER]
[--lr-num-warmup-steps LR_NUM_WARMUP_STEPS] [--lr-decay-start-step LR_DECAY_START_STEP] [--lr-num-decay-steps LR_NUM_DECAY_STEPS]
dlrm_s_pytorch.py: error: unrecognized arguments: --local-rank=0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 375622) of binary: /home/xxx/.conda/envs/torch2.0/bin/python
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/xxxpkg/pytorch2.0.0/torch/distributed/launch.py", line 196, in <module>
main()
File "/home/xxx/pkg/pytorch2.0.0/torch/distributed/launch.py", line 192, in main
launch(args)
File "/home/xxx/pkg/pytorch2.0.0/torch/distributed/launch.py", line 177, in launch
run(args)
File "/home/xxx/pkg/pytorch2.0.0/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/xxx/pkg/pytorch2.0.0/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/pkg/pytorch2.0.0/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
dlrm_s_pytorch.py FAILED
I used the command listed at README.md. I am wondering if that is no longer the correct command to run with (if so, what is the right command to run), or could you tell me more about what I did wrong?
Can you try the workaround suggested in the error message? in other words, rather than using args.local_rankhere, try printing and passing along os.environ['LOCAL_RANK'].
Hi Team,
I am able to run
python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --use-gpu
. But when I am trying to run dlrm_s_pytorch.py on single node multiple GPUs with nccl. Here is the command I used:I got tons of errors:
I used the command listed at README.md. I am wondering if that is no longer the correct command to run with (if so, what is the right command to run), or could you tell me more about what I did wrong?
Thanks in advance! Best, Yuxin