facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.5k stars 2.09k forks source link

IndexError: too many indices for tensor of dimension 1 #1161

Closed carokun closed 6 years ago

carokun commented 6 years ago

When I run the command

python3.6 examples/train_model.py -m seq2seq -t cornell_movie -dt train -mf /data2/chatbot_eval_issues/model_files/parlai/carolineokun/training_s2s_cmdb --gpu 0

I get the error:

Traceback (most recent call last): File "examples/train_model.py", line 28, in TrainLoop(opt).train() File "/home/carokun/ParlAI/parlai/scripts/train_model.py", line 316, in train world.parley() File "/home/carokun/ParlAI/parlai/core/worlds.py", line 249, in parley acts[1] = agents[1].act() File "/home/carokun/ParlAI/parlai/core/torch_agent.py", line 771, in act return self.batch_act([self.observation])[0] File "/home/carokun/ParlAI/parlai/core/torch_agent.py", line 791, in batch_act batch = self.batchify(observations) File "/home/carokun/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 302, in batchify return super().batchify(*args, **kwargs) File "/home/carokun/ParlAI/parlai/core/torch_agent.py", line 546, in batchify xs, x_lens = padded_tensor(_xs, self.NULL_IDX, self.use_cuda) File "/home/carokun/ParlAI/parlai/core/utils.py", line 892, in padded_tensor output[i, :lens[i]] = item IndexError: too many indices for tensor of dimension 1

jaseweston commented 6 years ago

do

python examples/display_data.py -t "cornell_movie"

and

python examples/display_data.py -t "#cornellmovie"

work for you? (they work for me)

carokun commented 6 years ago

No, the same error is thrown even when I add quotes.

emilydinan commented 6 years ago

I'm unable to reproduce as well. Does the error throw right away or is it on a particular training example? Can you run the display_data command that @jaseweston or does that not work either?

carokun commented 6 years ago

The error throws after around a minute and the display_data.py command does work!

stephenroller commented 6 years ago

Try adding --batchsize 32 to your call and see if that helps. I'm wondering if there's not an obscure bug with accidentally ending up with an empty batch.

alexholdenmiller commented 6 years ago

@carokun I was able to reproduce and will work on a patch. as @stephenroller suggested you should increase the batch size, which will dramatically increase the training speed of this model as well as making this error very unlikely to appear.

alexholdenmiller commented 6 years ago

fix is up, but you should still use a bigger batch size: batch size 1 does ~15 exs/s on my GPU, whereas batch size 32 does ~275 exs/s on a relatively small sample.