OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.42k stars 248 forks source link

train.py: error: unrecognized arguments: --warmup-ratio=0.01 #202

Closed EdmunddzzZ closed 2 years ago

EdmunddzzZ commented 2 years ago

When i run bash pretrain_ofa_base.sh it happens. I run it in a single GPU.

Entire error:

/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions FutureWarning, 2022-08-12 08:40:15 - instantiator.py[line:21] - INFO: Created a temporary directory at /tmp/tmpp7xat_v4 2022-08-12 08:40:15 - instantiator.py[line:76] - INFO: Writing /tmp/tmpp7xat_v4/_remote_module_non_scriptable.py 2022-08-12 08:40:15 - utils.py[line:160] - INFO: NumExpr defaulting to 2 threads. usage: train.py [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--log-file LOG_FILE] [--aim-repo AIM_REPO] [--aim-run-hash AIM_RUN_HASH] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT] [--azureml-logging] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--on-cpu-convert-precision] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--amp] [--amp-batch-retries AMP_BATCH_RETRIES] [--amp-init-scale AMP_INIT_SCALE] [--amp-scale-window AMP_SCALE_WINDOW] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--reset-logging] [--suppress-crashes] [--use-plasma-view] [--plasma-path PLASMA_PATH] [--criterion {adaptive_loss,composite_loss,cross_entropy,ctc,fastspeech2,hubert,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,label_smoothed_cross_entropy_with_ctc,legacy_masked_lm_loss,masked_lm,model,nat_loss,sentence_prediction,sentence_prediction_adapters,sentence_ranking,tacotron2,speech_to_unit,speech_to_spectrogram,speech_unit_lm_criterion,wav2vec,vocab_parallel_cross_entropy,scst_reward_criterion,adjust_label_smoothed_cross_entropy,clip_scst_reward_criterion,adjust_label_smoothed_encouraging_loss}] [--tokenizer {moses,nltk,space}] [--bpe {byte_bpe,bytes,characters,fastbpe,gpt2,bert,hf_byte_bpe,sentencepiece,subword_nmt}] [--optimizer {adadelta,adafactor,adagrad,adam,adamax,composite,cpu_adam,lamb,nag,sgd}] [--lr-scheduler {cosine,fixed,inverse_sqrt,manual,pass_through,polynomial_decay,reduce_lr_on_plateau,step,tri_stage,triangular}] [--scoring {bert_score,sacrebleu,bleu,chrf,meteor,wer}] [--task TASK] [--num-workers NUM_WORKERS] [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE] [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE] [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl {raw,lazy,cached,mmap,fasta,huffman}] [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET] [--valid-subset VALID_SUBSET] [--combine-valid-subsets] [--ignore-unused-valid-subsets] [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES] [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED] [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID] [--batch-size-valid BATCH_SIZE_VALID] [--max-valid-steps MAX_VALID_STEPS] [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID] [--grouped-shuffling] [--update-epoch-batch-itr UPDATE_EPOCH_BATCH_ITR] [--update-ordered-indices-seed] [--distributed-world-size DISTRIBUTED_WORLD_SIZE] [--distributed-num-procs DISTRIBUTED_NUM_PROCS] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD] [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--distributed-no-spawn] [--ddp-backend {c10d,fully_sharded,legacy_ddp,no_c10d,pytorch_ddp,slowmo}] [--ddp-comm-hook {none,fp16}] [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--gradient-as-bucket-view] [--fast-stat-sync] [--heartbeat-timeout HEARTBEAT_TIMEOUT] [--broadcast-buffers] [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-base-algorithm SLOWMO_BASE_ALGORITHM] [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel] [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES] [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE] [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES] [--pipeline-checkpoint {always,never,except_last}] [--zero-sharding {none,os}] [--no-reshard-after-forward] [--fp32-reduce-scatter] [--cpu-offload] [--use-sharded-state] [--not-fsdp-flatten-parameters] [--arch ARCH] [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE] [--stop-time-hours STOP_TIME_HOURS] [--clip-norm CLIP_NORM] [--sentence-avg] [--update-freq UPDATE_FREQ] [--lr LR] [--stop-min-lr STOP_MIN_LR] [--use-bmuf] [--skip-remainder-batch] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE] [--continue-once CONTINUE_ONCE] [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer] [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL] [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES] [--keep-interval-updates-pattern KEEP_INTERVAL_UPDATES_PATTERN] [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save] [--no-epoch-checkpoints] [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE] [--checkpoint-suffix CHECKPOINT_SUFFIX] [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--load-checkpoint-on-all-dp-ranks] [--write-checkpoints-asynchronously] [--store-ema] [--ema-decay EMA_DECAY] [--ema-start-update EMA_START_UPDATE] [--ema-seed-model EMA_SEED_MODEL] [--ema-update-freq EMA_UPDATE_FREQ] [--ema-fp32] [--activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--dropout D] [--attention-dropout D] [--activation-dropout D] [--encoder-embed-path STR] [--encoder-embed-dim N] [--encoder-ffn-embed-dim N] [--encoder-layers N] [--encoder-attention-heads N] [--encoder-normalize-before] [--encoder-learned-pos] [--decoder-embed-path STR] [--decoder-embed-dim N] [--decoder-ffn-embed-dim N] [--decoder-layers N] [--decoder-attention-heads N] [--decoder-learned-pos] [--decoder-normalize-before] [--decoder-output-dim N] [--share-decoder-input-output-embed] [--share-all-embeddings] [--no-token-positional-embeddings] [--adaptive-softmax-cutoff EXPR] [--adaptive-softmax-dropout D] [--layernorm-embedding] [--no-scale-embedding] [--checkpoint-activations] [--offload-activations] [--no-cross-attention] [--cross-self-attention] [--encoder-layerdrop D] [--decoder-layerdrop D] [--encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP] [--decoder-layers-to-keep DECODER_LAYERS_TO_KEEP] [--quant-noise-pq D] [--quant-noise-pq-block-size D] [--quant-noise-scalar D] [--min-params-to-wrap D] [--resnet-drop-path-rate RESNET_DROP_PATH_RATE] [--encoder-drop-path-rate ENCODER_DROP_PATH_RATE] [--decoder-drop-path-rate DECODER_DROP_PATH_RATE] [--token-bucket-size TOKEN_BUCKET_SIZE] [--image-bucket-size IMAGE_BUCKET_SIZE] [--attn-scale-factor ATTN_SCALE_FACTOR] [--freeze-resnet] [--freeze-encoder-embedding] [--freeze-decoder-embedding] [--add-type-embedding] [--interpolate-position] [--resnet-type {resnet50,resnet101,resnet152}] [--resnet-model-path STR] [--code-image-size CODE_IMAGE_SIZE] [--patch-layernorm-embedding] [--code-layernorm-embedding] [--entangle-position-embedding] [--disable-entangle] [--sync-bn] [--scale-attn] [--scale-fc] [--scale-heads] [--scale-resids] [--pooler-dropout D] [--pooler-classifier {mlp,linear}] [--pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--spectral-norm-classification-head] [--selected-cols SELECTED_COLS] [--bpe-dir BPE_DIR] [--max-source-positions MAX_SOURCE_POSITIONS] [--max-target-positions MAX_TARGET_POSITIONS] [--max-src-length MAX_SRC_LENGTH] [--max-tgt-length MAX_TGT_LENGTH] [--code-dict-size CODE_DICT_SIZE] [--patch-image-size PATCH_IMAGE_SIZE] [--orig-patch-image-size ORIG_PATCH_IMAGE_SIZE] [--num-bins NUM_BINS] [--imagenet-default-mean-and-std] [--constraint-range CONSTRAINT_RANGE] [--max-image-size MAX_IMAGE_SIZE] [--text-data TEXT_DATA] [--image-data IMAGE_DATA] [--detection-data DETECTION_DATA] [--text-selected-cols TEXT_SELECTED_COLS] [--image-selected-cols IMAGE_SELECTED_COLS] [--detection-selected-cols DETECTION_SELECTED_COLS] [--neg-sample-dir NEG_SAMPLE_DIR] [--pretrain-seed PRETRAIN_SEED] [--mask-ratio MASK_RATIO] [--random-ratio RANDOM_RATIO] [--keep-ratio KEEP_RATIO] [--mask-length MASK_LENGTH] [--poisson-lambda POISSON_LAMBDA] [--replace-length REPLACE_LENGTH] [--label-smoothing LABEL_SMOOTHING] [--report-accuracy] [--ignore-prefix-size IGNORE_PREFIX_SIZE] [--ignore-eos] [--drop-worst-ratio DROP_WORST_RATIO] [--drop-worst-after DROP_WORST_AFTER] [--use-rdrop] [--reg-alpha REG_ALPHA] [--sample-patch-num SAMPLE_PATCH_NUM] [--adam-betas ADAM_BETAS] [--adam-eps ADAM_EPS] [--weight-decay WEIGHT_DECAY] [--use-old-adam] [--fp16-adam-stats] [--warmup-updates WARMUP_UPDATES] [--force-anneal FORCE_ANNEAL] [--end-learning-rate END_LEARNING_RATE] [--power POWER] [--total-num-update TOTAL_NUM_UPDATE] [--pad PAD] [--eos EOS] [--unk UNK] data train.py: error: unrecognized arguments: --warmup-steps=0.01 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 2543) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 193, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 755, in run )(*cmd_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 247, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/content/OFA/train.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-08-12_08:40:18 host : d5f687eca1c2 rank : 0 (local_rank: 0) exitcode : 2 (pid: 2543) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

JustinLin610 commented 2 years ago

88 Duplicate issue