--max-tokens: invalid typing.Optional[int]

KMFODA commented 2 years ago

🐛 Bug

Getting an invalid typing.Optional[int] error when setting max-tokens and trying to train a model using fairseq-train.

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run:

CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

See error

usage: fairseq-train [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format LOG_FORMAT]
                     [--tensorboard-logdir TENSORBOARD_LOGDIR] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16]
                     [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE]
                     [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
                     [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR]
                     [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE]
                     [--model-parallel-size MODEL_PARALLEL_SIZE] [--checkpoint-suffix CHECKPOINT_SUFFIX]
                     [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--quantization-config-path QUANTIZATION_CONFIG_PATH]
                     [--profile]
                     [--criterion {cross_entropy,ctc,adaptive_loss,wav2vec,legacy_masked_lm_loss,nat_loss,label_smoothed_cross_entropy,composite_loss,sentence_prediction,label_smoothed_cross_entropy_with_alignment,masked_lm,sentence_ranking,vocab_parallel_cross_entropy}]
                     [--tokenizer {nltk,space,moses}]
                     [--bpe {sentencepiece,fastbpe,gpt2,subword_nmt,hf_byte_bpe,bert,byte_bpe,characters,bytes}]
                     [--optimizer {nag,adafactor,sgd,adamax,adagrad,adam,lamb,adadelta}]
                     [--lr-scheduler {fixed,reduce_lr_on_plateau,polynomial_decay,inverse_sqrt,tri_stage,cosine,triangular}]
                     [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK] [--num-workers NUM_WORKERS]
                     [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE]
                     [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE]
                     [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl DATASET_IMPL]
                     [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET] [--valid-subset VALID_SUBSET]
                     [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES]
                     [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED]
                     [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID] [--batch-size-valid BATCH_SIZE_VALID]
                     [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID]
                     [--distributed-world-size DISTRIBUTED_WORLD_SIZE] [--distributed-rank DISTRIBUTED_RANK]
                     [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD]
                     [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--local-rank LOCAL_RANK]
                     [--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}] [--bucket-cap-mb BUCKET_CAP_MB]
                     [--fix-batches-to-gpus] [--find-unused-parameters] [--fast-stat-sync] [--broadcast-buffers]
                     [--distributed-wrapper {DDP,SlowMo}] [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-algorithm SLOWMO_ALGORITHM]
                     [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel]
                     [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES]
                     [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE]
                     [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE]
                     [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES] [--pipeline-checkpoint {always,never,except_last}]
                     [--zero-sharding {none,os}] [--arch ARCH] [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE]
                     [--stop-time-hours STOP_TIME_HOURS] [--clip-norm CLIP_NORM] [--sentence-avg] [--update-freq UPDATE_FREQ]
                     [--lr LR] [--min-lr MIN_LR] [--use-bmuf] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE]
                     [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters]
                     [--reset-optimizer] [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL]
                     [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES]
                     [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save]
                     [--no-epoch-checkpoints] [--no-last-checkpoints] [--no-save-optimizer-state]
                     [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE]
fairseq-train: error: argument --max-tokens: invalid typing.Optional[int] value: '4000'

Code sample

Expected behavior

Environment

fairseq Version (e.g., 1.0 or main): 0.10.0
PyTorch Version (e.g., 1.0): 1.10.0
OS (e.g., Linux): Mac
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source):
Python version: 3.9.4
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

LyricZhao commented 2 years ago

It seems to be a bug with Python 3.9 compatibility. Downgrading to 3.7 works for me.

Adridot commented 2 years ago

Hi, I am having the same issue, and unfortunately downgrading to python 3.7.12 didn't solve the issue... The issue happens on PyCharm, using Jupyter notebook, but it doesn't happen on Google Colab, even though Python & fairseq versions are identical

panserbjorn commented 2 years ago

Any news regarding this error? I have the same problem and cannot downgrade to python 3.7 at the moment.

fairseq: 0.10.0
PyTorch: 1.10.0 (compiled with Easybuild foss toolchain 2021)
CUDA: 11.3.1
Python: 3.9.5
How I installed fairseq: pip install
OS: Linux, Red Hat Enterprise Linux 8.5 (Ootpa)

facebookresearch / fairseq