facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

TypeError: 'bool' object is not callable #2486

Open thpun opened 4 years ago

thpun commented 4 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

The training starts smoothly if using temperature sampling, but it failed if using RoundRobin.

  1. Run cmd
    lang_pairs=<comma-separated list of training lang pairs>
    DATA=<path to training data>
    lang_list=<path to 25 lang id file>
    MODEL=<path to pretrained mode>
    fairseq-train $DATA \
    --finetune-from-model $MODEL \
    --encoder-normalize-before --decoder-normalize-before \
    --arch mbart_large --layernorm-embedding \
    --task translation_multi_simple_epoch \
    --sampling-method "RoundRobin" \
    --encoder-langtok "src" \
    --decoder-langtok \
    --lang-dict "$lang_list" \
    --lang-pairs "$lang_pairs" \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
    --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
    --lr-scheduler inverse_sqrt --lr 6e-05 --min-lr -1 --warmup-updates 5000 \
    --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
    --max-tokens 1024 --update-freq 4 --virtual-epoch-size 2000000 \
    --save-interval 1 --save-interval-updates 2500 --no-epoch-checkpoints \
    --seed 222 --log-format simple --log-interval 10 \
    --fp16 --max-update 100000 --save-dir $SAVEDIR
  2. See error
    
    Traceback (most recent call last):
    File "/opt/conda/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
    File "/workspace/fairseq/fairseq_cli/train.py", line 351, in cli_main
    distributed_utils.call_main(args, main)
    File "/workspace/fairseq/fairseq/distributed_utils.py", line 174, in call_main
    args.distributed_world_size,
    File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
    File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
    Exception:

-- Process 3 terminated with the following error: Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/workspace/fairseq/fairseq/distributed_utils.py", line 156, in distributed_main main(args, kwargs) File "/workspace/fairseq/fairseq_cli/train.py", line 106, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer) File "/workspace/fairseq/fairseq/checkpoint_utils.py", line 188, in load_checkpoint epoch=1, load_dataset=True, passthrough_args File "/workspace/fairseq/fairseq/trainer.py", line 350, in get_train_iterator epoch=epoch File "/workspace/fairseq/fairseq/tasks/translation_multi_simple_epoch.py", line 303, in get_batch_iterator seed=seed, num_shards=num_shards, shard_id=shard_id, num_workers=num_workers, epoch=epoch, File "/workspace/fairseq/fairseq/tasks/fairseq_task.py", line 184, in get_batch_iterator required_batch_size_multiple=required_batch_size_multiple, TypeError: 'bool' object is not callable



### Expected behavior

Training starts smoothly, just like in temperature sampling

### Environment

 - fairseq Version (e.g., 1.0 or master): master, commit 4c55744ec4cb26749cf2cf8dac89942f26ce4bd2
 - PyTorch Version (e.g., 1.0) `1.5.0a0+8f84ded`
 - OS (e.g., Linux): Linux
 - How you installed fairseq (`pip`, source): source
 - Build command you used (if compiling from source): `pip install --editable .`
 - Python version: 3.6.9
 - CUDA/cuDNN version: 10.1
 - GPU models and configuration: V100
thpun commented 4 years ago

cc @tangyuq Does translation_multi_simple_epoch support sampling method other than temperature sampling at this moment?

tangyuq commented 4 years ago

translation_multi_simple_epoch only supports "temperature" sampling right now.

thpun commented 4 years ago

Is there any plan to support the rest of the options in --sampling-method, i.e. uniform, Concat & RoundRobin in near future?