train model failed - Githubissues

akafen commented 3 years ago

I change cluster "local" to "debug" in scripts/train_model.py and I run the command "python3 scripts/train_models.py' ,but fail The error :

fairseq-train /home/liuyijiao/muss/resources/datasets/_d41b33752d58c3fa688aef596b98df2b/fairseq_preprocessed_complex-simple --task translation --source-lang complex --target-lang simple --save-dir /home/liuyijiao/muss/experiments/fairseq/slurmjob_DEBUG_139908269653632/checkpoints --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --lr-scheduler polynomial_decay --lr 3e-05 --warmup-updates 2500 --update-freq 16 --arch mbart_large --dropout 0.3 --weight-decay 0.0 --clip-norm 0.1 --share-all-embeddings --no-epoch-checkpoints --save-interval 999999 --validate-interval 999999 --max-update 50000 --save-interval-updates 100 --keep-interval-updates 1 --patience 10 --max-sentences 64 --seed 708 --distributed-world-size 8 --distributed-port 11733 --fp16 --restore-file '/home/liuyijiao/muss/resources/models/mbart/model.pt' --task 'translation_from_pretrained_bart' --source-lang 'complex' --target-lang 'simple' --encoder-normalize-before --decoder-normalize-before --label-smoothing 0.2 --dataset-impl 'mmap' --optimizer 'adam' --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --min-lr -1 --total-num-update 40000 --attention-dropout 0.1 --weight-decay 0.0 --max-tokens 1024 --update-freq 2 --log-format 'simple' --log-interval 2 --reset-optimizer --reset-meters --reset-dataloader --reset-lr-scheduler --langs 'ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN' --layernorm-embedding --ddp-backend 'no_c10d' usage: train_models.py [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--checkpoint-suffix CHECKPOINT_SUFFIX] [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--criterion {sentence_ranking,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,sentence_prediction,cross_entropy,ctc,legacy_masked_lm_loss,masked_lm,adaptive_loss,nat_loss,composite_loss,wav2vec,vocab_parallel_cross_entropy}] [--tokenizer {nltk,moses,space}] [--bpe {byte_bpe,subword_nmt,sentencepiece,gpt2,characters,bert,hf_byte_bpe,bytes,fastbpe}] [--optimizer {sgd,adagrad,nag,adadelta,lamb,adafactor,adamax,adam}] [--lr-scheduler {inverse_sqrt,tri_stage,reduce_lr_on_plateau,triangular,polynomial_decay,cosine,fixed}] [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK] [--num-workers NUM_WORKERS] [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE] [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE] [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl {raw,lazy,cached,mmap,fasta}] [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET] [--valid-subset VALID_SUBSET] [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES] [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED] [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID] [--batch-size-valid BATCH_SIZE_VALID] [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID] [--distributed-world-size DISTRIBUTED_WORLD_SIZE] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD] [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}] [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--fast-stat-sync] [--broadcast-buffers] [--distributed-wrapper {DDP,SlowMo}] [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-algorithm SLOWMO_ALGORITHM] [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel] [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES] [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE] [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES] [--pipeline-checkpoint {always,never,except_last}] [--zero-sharding {none,os}] [--arch ARCH] [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE] [--stop-time-hours STOP_TIME_HOURS] [--clip-norm CLIP_NORM] [--sentence-avg] [--update-freq UPDATE_FREQ] [--lr LR] [--min-lr MIN_LR] [--use-bmuf] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE] [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer] [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL] [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES] [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save] [--no-epoch-checkpoints] [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE] [--activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--dropout D] [--attention-dropout D] [--activation-dropout D] [--encoder-embed-path STR] [--encoder-embed-dim N] [--encoder-ffn-embed-dim N] [--encoder-layers N] [--encoder-attention-heads N] [--encoder-normalize-before] [--encoder-learned-pos] [--decoder-embed-path STR] [--decoder-embed-dim N] [--decoder-ffn-embed-dim N] [--decoder-layers N] [--decoder-attention-heads N] [--decoder-learned-pos] [--decoder-normalize-before] [--decoder-output-dim N] [--share-decoder-input-output-embed] [--share-all-embeddings] [--no-token-positional-embeddings] [--adaptive-softmax-cutoff EXPR] [--adaptive-softmax-dropout D] [--layernorm-embedding] [--no-scale-embedding] [--no-cross-attention] [--cross-self-attention] [--encoder-layerdrop D] [--decoder-layerdrop D] [--encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP] [--decoder-layers-to-keep DECODER_LAYERS_TO_KEEP] [--quant-noise-pq D] [--quant-noise-pq-block-size D] [--quant-noise-scalar D] [--pooler-dropout D] [--pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [--spectral-norm-classification-head] [--label-smoothing D] [--report-accuracy] [--ignore-prefix-size IGNORE_PREFIX_SIZE] [--adam-betas ADAM_BETAS] [--adam-eps ADAM_EPS] [--weight-decay WEIGHT_DECAY] [--use-old-adam] [--force-anneal N] [--warmup-updates N] [--end-learning-rate END_LEARNING_RATE] [--power POWER] [--total-num-update TOTAL_NUM_UPDATE] [-s SRC] [-t TARGET] [--load-alignments] [--left-pad-source BOOL] [--left-pad-target BOOL] [--max-source-positions N] [--max-target-positions N] [--upsample-primary UPSAMPLE_PRIMARY] [--truncate-source] [--num-batch-buckets N] [--eval-bleu] [--eval-bleu-detok EVAL_BLEU_DETOK] [--eval-bleu-detok-args JSON] [--eval-tokenized-bleu] [--eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [--eval-bleu-args JSON] [--eval-bleu-print-samples] --langs LANG [--prepend-bos] data train_models.py: error: unrecognized arguments: --max-sentences 64 fairseq_prepare_and_train failed after 0.87s. fairseq_train_and_evaluate_with_parametrization failed after 0.87s.

The code:

for exp_name, kwargs in tqdm(kwargs_dict.items()):
    executor = get_executor(
        cluster='debug',
        slurm_partition='priority',
        submit_decorators=[print_function_name, print_args, print_job_id, print_result, print_running_time],
        timeout_min=2 * 24 * 60,
        slurm_comment='EMNLP Arxiv deadline May 1st',
        gpus_per_node=kwargs['train_kwargs']['ngpus'],
        nodes=1,
        slurm_constraint='volta32gb',
        name=exp_name,
    )
    for i in range(5):
        job = executor.submit(fairseq_train_and_evaluate_with_parametrization, **kwargs)
        jobs_dict[exp_name].append(job)
[job.result() for jobs in jobs_dict.values() for job in jobs]

When cluster is "local" ,train fail too

louismartin commented 3 years ago

Hi @akafen , Thanks for pointing this issue, I'm currently fixing the bugs for model training and I hope I can give you an updated code soon.

NomadXD commented 3 years ago

@akafen Can you specify the infrastructure specs that you are trying to run the setup ? Like the GPUs and the memory .

akafen commented 3 years ago

@NomadXD Of course I can specify the infrastructure specs that I am trying to run the setup.I am using one gpu to run the code

-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.48 Driver Version: 410.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp Off | 00000000:02:00.0 Off | N/A | | 19% 35C P0 60W / 250W | 0MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+

But I think the error is not about the infrastructure specs."Max sentences" is not the arguments of fairseq-train is the error.From the error information:

train_models.py: error: unrecognized arguments: --max-sentences 64

"Max sentences" is an unrecognized arguments

louismartin commented 3 years ago

Yes it's due to a problem in the fairseq version. I'm trying to find a solution to that right now :)

louismartin commented 3 years ago

Basically the --max-sentences argument was removed in this PR (fairseq>=0.9.0, that we install using pip) and was later added back for backward compatibility only in this PR (fairseq>=1.0.0a0 not yet on pip).

akafen commented 3 years ago

@louismartin I am using fairseq==0.10.2,so I should remove --max-sentences argument and change it to batch_size?

louismartin commented 3 years ago

Yes that is the solution that I just pushed, I also made the training script simpler if you want to train a single model, maybe that can help as well. Tell me if that works well on your end and we can close the issue.

Atharva-Phatak commented 3 years ago

@louismartin Any plans on porting this to pure PyTorch ? If ported to pytorch, maybe it will be more accessible to people who don't have knowledge about fairseq ?

louismartin commented 3 years ago

Hi @Atharva-Phatak ,

Thanks for the message, fairseq uses pytorch. There is no plan do use something else.

louismartin commented 3 years ago

@akafen did it solve your issue?

louismartin commented 3 years ago

I'm closing the issue but feel free to open a new issue if you have further questions or problems.

facebookresearch / muss

train model failed #5