facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.49k stars 6.41k forks source link

local_rank error #2859

Closed getao closed 4 years ago

getao commented 4 years ago

I used the distributed training and follow the way here: https://fairseq.readthedocs.io/en/latest/getting_started.html#distributed-training.

However, I got local rank argument error:

usage: fairseq-train [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--tokenizer {nltk,space,moses}] [--bpe {bert,bytes,hf_byte_bpe,characters,fastbpe,sentencepiece,subword_nmt,gpt2,byte_bpe}] [--criterion {wav2vec,cross_entropy,nat_loss,sentence_prediction,composite_loss,legacy_masked_lm_loss,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,adaptive_loss,sentence_ranking,ctc,masked_lm,vocab_parallel_cross_entropy}] [--optimizer {adagrad,adamax,adadelta,sgd,adafactor,lamb,nag,adam}] [--lr-scheduler {fixed,inverse_sqrt,tri_stage,triangular,cosine,reduce_lr_on_plateau,polynomial_decay}] [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK] [--num-workers NUM_WORKERS] [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE] [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE] [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl {raw,lazy,cached,mmap,fasta}] [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET] [--valid-subset VALID_SUBSET] [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES] [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED] [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID] [--batch-size-valid BATCH_SIZE_VALID] [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID] [--distributed-world-size DISTRIBUTED_WORLD_SIZE] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD] [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--local-rank LOCAL_RANK] [--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}] [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--fast-stat-sync] [--broadcast-buffers] [--distributed-wrapper {DDP,SlowMo}] [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-algorithm SLOWMO_ALGORITHM] [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel] [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES] [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE] [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES] [--pipeline-checkpoint {always,never,except_last}] [--zero-sharding {none,os}] [--arch ARCH] [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE] [--stop-time-hours STOP_TIME_HOURS] [--clip-norm CLIP_NORM] [--sentence-avg] [--update-freq UPDATE_FREQ] [--lr LR] [--min-lr MIN_LR] [--use-bmuf] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE] [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer] [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL] [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES] [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save] [--no-epoch-checkpoints] [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE] [--checkpoint-suffix CHECKPOINT_SUFFIX] [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--dropout D] [--attention-dropout D] [--activation-dropout D] [--encoder-embed-path STR] [--encoder-embed-dim N] [--encoder-ffn-embed-dim N] [--encoder-layers N] [--encoder-attention-heads N] [--encoder-normalize-before] [--encoder-learned-pos] [--decoder-embed-path STR] [--decoder-embed-dim N] [--decoder-ffn-embed-dim N] [--decoder-layers N] [--decoder-attention-heads N] [--decoder-learned-pos] [--decoder-normalize-before] [--decoder-output-dim N] [--share-decoder-input-output-embed] [--share-all-embeddings] [--no-token-positional-embeddings] [--adaptive-softmax-cutoff EXPR] [--adaptive-softmax-dropout D] [--layernorm-embedding] [--no-scale-embedding] [--checkpoint-activations] [--no-cross-attention] [--cross-self-attention] [--encoder-layerdrop D] [--decoder-layerdrop D] [--encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP] [--decoder-layers-to-keep DECODER_LAYERS_TO_KEEP] [--quant-noise-pq D] [--quant-noise-pq-block-size D] [--quant-noise-scalar D] [--pooler-dropout D] [--pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--spectral-norm-classification-head] [-s SRC] [-t TARGET] [--load-alignments] [--left-pad-source BOOL] [--left-pad-target BOOL] [--max-source-positions N] [--max-target-positions N] [--upsample-primary UPSAMPLE_PRIMARY] [--truncate-source] [--num-batch-buckets N] [--eval-bleu] [--eval-bleu-detok EVAL_BLEU_DETOK] [--eval-bleu-detok-args JSON] [--eval-tokenized-bleu] [--eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [--eval-bleu-args JSON] [--eval-bleu-print-samples] [--label-smoothing D] [--report-accuracy] [--ignore-prefix-size IGNORE_PREFIX_SIZE] [--adam-betas ADAM_BETAS] [--adam-eps ADAM_EPS] [--weight-decay WEIGHT_DECAY] [--use-old-adam] [--force-anneal N] [--warmup-updates N] [--end-learning-rate END_LEARNING_RATE] [--power POWER] [--total-num-update TOTAL_NUM_UPDATE] [--pad PAD] [--eos EOS] [--unk UNK] data fairseq-train: error: unrecognized arguments: --local_rank=3

It seems that in fairseq it wants --local-rank but in practice, it ran with --local_rank.

Is there any solution to it?

Thanks.

What's your environment?

fairseq (master: Nov 4, 2020)

alexeib commented 4 years ago

have you tried --local-rank?

getao commented 4 years ago

have you tried --local-rank?

I didn't specify --local-rank in my command line:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=3 --node_rank=$2 --master_addr=$1 --master_port=12345 \ $(which fairseq-train) cnn_dm-bin \ --restore-file $BART_PATH \ --max-tokens $MAX_TOKENS \ --task translation \ --source-lang source --target-lang target \ --truncate-source \ --layernorm-embedding \ --share-all-embeddings \ --share-decoder-input-output-embed \ --reset-optimizer --reset-dataloader --reset-meters \ --required-batch-size-multiple 1 \ --arch bart_large \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ --dropout 0.1 --attention-dropout 0.1 \ --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \ --clip-norm 0.1 \ --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \ --fp16 --update-freq $UPDATE_FREQ \ --skip-invalid-size-inputs-valid-test \ --find-unused-parameters;

The error log is as below: fairseq-train: error: unrecognized arguments: --local_rank=7 Traceback (most recent call last): File "/home/msrauser/anaconda3/envs/pytorch1.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/msrauser/anaconda3/envs/pytorch1.6/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/msrauser/anaconda3/envs/pytorch1.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in main() File "/home/msrauser/anaconda3/envs/pytorch1.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/msrauser/anaconda3/envs/pytorch1.6/bin/python', '-u', '/home/msrauser/anaconda3/envs/pytorch1.6/bin/fairseq-train', '--local_rank=7', xxxxx

myleott commented 4 years ago

Ah yeah, python -m torch.distributed.launch will only populate --local_rank (with an underscore): https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L268

@alexeib, can we add an alias?

getao commented 4 years ago

Ah yeah, python -m torch.distributed.launch will only populate --local_rank (with an underscore): https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L268

@alexeib, can we add an alias?

Thanks for your answer. Do we have any workaround now?

magiczixiao commented 2 years ago

I encountered the same problem when using "fairseq-hydra-train" to pretrain wav2vec2.0 model: fairseq-hydra-train: error: unrecognized arguments: --local_rank=0 Here are the command: python -m torch.distributed.launch --nproc_per_node=1 \ --nnodes=2 --node_rank=0 --master_addr="192.168.24.42" \ --master_port=12345 \ ./fairseq-hydra-train task.data=my_data_set \ --config-dir ./fairseq-main/examples/wav2vec/config/pretraining \ --config-name my_config Could you give some advice on how can I use fairseq-hydra-train to train on multi-node? Extremely grateful.

Poeroz commented 2 years ago

I encountered the same problem when using "fairseq-hydra-train" to pretrain wav2vec2.0 model: fairseq-hydra-train: error: unrecognized arguments: --local_rank=0 Here are the command: python -m torch.distributed.launch --nproc_per_node=1 \ --nnodes=2 --node_rank=0 --master_addr="192.168.24.42" \ --master_port=12345 \ ./fairseq-hydra-train task.data=my_data_set \ --config-dir ./fairseq-main/examples/wav2vec/config/pretraining \ --config-name my_config Could you give some advice on how can I use fairseq-hydra-train to train on multi-node? Extremely grateful.

Same problem. Have you solved this now?

CaitlinZOO commented 2 years ago

I encountered the same problem when using "fairseq-hydra-train" to pretrain wav2vec2.0 model: fairseq-hydra-train: error: unrecognized arguments: --local_rank=0 Here are the command: python -m torch.distributed.launch --nproc_per_node=1 \ --nnodes=2 --node_rank=0 --master_addr="192.168.24.42" \ --master_port=12345 \ ./fairseq-hydra-train task.data=my_data_set \ --config-dir ./fairseq-main/examples/wav2vec/config/pretraining \ --config-name my_config Could you give some advice on how can I use fairseq-hydra-train to train on multi-node? Extremely grateful.

Same problem. Have you solved this now?

I also faced this problem when using fairseq-hydra-train.

Rongjiehuang commented 2 years ago

@magiczixiao @Poeroz @CaitlinZOO Same problem. Have you solved this now?

flckv commented 1 year ago

fairseq-hydra-train: error: unrecognized arguments: --restore-file /home/user/outputs/2023-05-22/07-19-53/checkpoints/checkpoint_last.pt

when tried to continue pretraining wav2vec

Saoussenl commented 10 months ago

Hello, why is this issue closed when the problem isn't really solved ?