facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.36k stars 6.4k forks source link

`'NoneType' object is not subscriptable` when running fairseq-train #2658

Closed ethch18 closed 4 years ago

ethch18 commented 4 years ago

What is your question?

I'm new to fairseq and am trying to train a simple LSTM-based model for a grapheme-to-phoneme conversion task, using a command similar to the one here. I have five different datasets that I've generated with the fairseq-preprocess command, and I'm able to complete model training for four of them. However, on the fifth (and smallest) one, I get the following error (full output here):

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/homes/gws/echau18/miniconda3/envs/loanwords/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/homes/gws/echau18/lib/fairseq/fairseq_cli/train.py", line 296, in distributed_main
    main(args, init_distributed=True)
  File "/homes/gws/echau18/lib/fairseq/fairseq_cli/train.py", line 86, in main
    train(args, trainer, task, epoch_itr)
  File "/homes/gws/echau18/lib/fairseq/fairseq_cli/train.py", line 127, in train
    log_output = trainer.train_step(samples)
  File "/homes/gws/echau18/lib/fairseq/fairseq/trainer.py", line 330, in train_step
    sample, self.model, self.criterion, self.optimizer, ignore_grad
  File "/homes/gws/echau18/lib/fairseq/fairseq/tasks/fairseq_task.py", line 251, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/homes/gws/echau18/miniconda3/envs/loanwords/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/homes/gws/echau18/lib/fairseq/fairseq/criterions/cross_entropy.py", line 28, in forward
    net_output = model(**sample['net_input'])
TypeError: 'NoneType' object is not subscriptable

The other issues I've seen involving this error have involved OOMs or bugs that have been fixed, but since my dataset is very small (1173 examples over two Titan X GPUs) I don't think OOM is the problem. Any pointers on how to approach this? Thanks in advance!

What have you tried?

I've tried printing out sample and sample['net_input'] and confirmed that both are not None. Not really sure where else to look.

What's your environment?

ethch18 commented 4 years ago

Realized that sample was None on the other GPU, so adjusting batch size/using only one device fixed this!