6 - Attention is All You Need : nn.Embedding related Runtime error with positional encoding layer during training

Ninja16180 commented 4 years ago

Hi Ben, Thanks for these awesome tutorials on pytorch. While following the last one of the series: 6 - Attention is All You Need, I came across nn.Embedding related Runtime error with positional encoding layer during training.

I am executing the code in Google Colab.

Detailed error description:

RuntimeError Traceback (most recent call last)

in () 8 start_time = time.time() 9 ---> 10 train_loss = train(model, train_iterator, optimizer, criterion, CLIP) 11 valid_loss = evaluate(model, valid_iterator, criterion) 12 7 frames /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 1482 # remove once script supports set_grad_enabled 1483 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type) -> 1484 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 1485 1486 RuntimeError: index out of range: Tried to access index 100 out of table with 99 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 ------------------------------------------------------------------------------------- My query: I have checked for similar issues online and could understand that : The error is because embedding_dim must be equal to the vocab size, but when initializing the embedding layer with len(voca_size) it somehow subtracts 1. I refered this git issue to resolve the error: https://github.com/chenxijun1029/DeepFM_with_PyTorch/issues/1 So accordingly for the positional layer I have explicitely added 1 to the max_length defined while creating the layer. "self.pos_embedding = nn.Embedding(max_length+1, hid_dim)" However, I ended up getting similar runtime error: RuntimeError: index out of range: Tried to access index 101 out of table with 100 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 --------------------------------------------------------------------------- Could you please help in resolving the issue? Thanks in advance!

bentrevett commented 4 years ago

There shouldn't be a need to do a +1 anywhere. I am not sure why that solves the issue you linked, I believe that's some other bug in their code.

The issue here is from the positional embedding, not the token embedding. The positional embedding has a "vocabulary size" of 100, which means the maximum length sequence can be passed to it is 100 tokens, because of the way the position indexes are created:

pos = torch.arange(0, trg_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)

If you pass a sequence that is >100 tokens long, the pos tensor will contain values that are greater than the number of rows in your embedding matrix, hence you will get an index out of range error.

I actually should have added this to the tutorials, but there needs to be a part where you cut your sentences to a maximum length, which should equal the "vocab size" of your positional embeddings, i.e. the max_length argument passed to the Encoder and Decoder.

One way for you to double check this is the error is to just print out the sizes of the tensors input into your model by adding print(src.shape) at the start of the encoder's forward method and print(trg.shape) for the decoder. I predict you will see it print out a tensor that is >100 elements long just before it throws the error.

One way to solve this is by setting a max length argument in the tokenizers:

def tokenize_de(text, max_length=100):
    return [tok.text for tok in spacy_de.tokenizer(text)][:max_length-2]

def tokenize_en(text, max_length=100):
    return [tok.text for tok in spacy_en.tokenizer(text)][:max_length-2]

Note that we have to do max_length-2 as our source and target sentences have <sos> and <eos> tokens appended to them.

Let me know if this solves the issue.

Ninja16180 commented 4 years ago

Thanks a lot Ben for your prompt and elaborate response.

Yes,as you rightly predicted, the input tensor size is >100(as given below): torch.Size([33, 128])

However, I set the max_length argument in the tokenizer functions as you told but unfortunately, still the same runtime error came:

RuntimeError Traceback (most recent call last)

in () 8 start_time = time.time() 9 ---> 10 train_loss = train(model, train_iterator, optimizer, criterion, CLIP) 11 valid_loss = evaluate(model, valid_iterator, criterion) 12 7 frames /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 1482 # remove once script supports set_grad_enabled 1483 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type) -> 1484 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 1485 1486 RuntimeError: index out of range: Tried to access index 100 out of table with 99 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 --------------------------------------------------------------------------------- Attaching the file which I used to run the code for your reference: [Practice_Transformer_Attention.ipynb.zip](https://github.com/bentrevett/pytorch-seq2seq/files/4377163/Practice_Transformer_Attention.ipynb.zip) Could you please have a look to suggest where it is going wrong? Thanks in advance!

bentrevett commented 4 years ago

OK, I've found the error.

Transformers expect the data to be [batch_size, seq_len], note the batch is the first dimension. In PyTorch, RNNs expect the batch to be the second dimension, because of RNNs common usage in NLP, TorchText by default returns batches shaped [seq_len, batch_size].

The issue here is that your batch and sequence length dimensions are flipped (you have batch second, as default, but the transformer should be fed examples with the batch first) and as you're using a batch size >100 you are getting the out of index error.

Luckily, it's an easy fix. Where you define your fields you need to set batch_first = True, so they should look like:

SRC = Field(tokenize = tokenize_de, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            lower = True, 
            batch_first = True) #added this

TRG = Field(tokenize = tokenize_en, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            lower = True, 
            batch_first = True) #added this

With this, you don't actually need the tokenizer cutting things down to max_length-2 anymore as there are no examples in the Multi30k dataset that are >100 tokens.

bentrevett commented 4 years ago

I'll try and stress further in the tutorials that the batch needs to be first for the transformer model.

Ninja16180 commented 4 years ago

Thanks a mil! This solution worked!

Ninja16180 commented 4 years ago

I am thus closing this issue

bentrevett / pytorch-seq2seq

6 - Attention is All You Need : nn.Embedding related Runtime error with positional encoding layer during training #89