Open actforjason opened 3 years ago
Oh,there is a line
self.embedding = nn.Embedding(input_dim, embedding_dim, padding_idx=pad_idx)
point out padding_idx=pad_idx
,but why isn't pack_padded_sequence&pad_packed_sequence needed?
You only need to use pack_padded_sequence
and pad_packed_sequence
if you want to do something with the final hidden (and cell) states, denoted in the code as hidden
and cell
, respectively from outputs, (hidden, cell) = self.lstm(embedded)
. This is because the packing and padding allows us to get the final hidden state from the last non-pad element in each sequence within the batch.
When doing POS tagging we want to do something with a sequence of hidden states, which we get from the outputs
variable from outputs, (hidden, cell) = self.lstm(embedded)
. Yes, some of those hidden states will be from padded tokens, but I don't believe there is a nice way in PyTorch to get a sequence of hidden states up to a certain point. However, this doesn't really matter because when we pass the padding tag index to the CrossEntropyLoss
we're telling PyTorch not to calculate losses over these pad tokens. We're only losing the time taken to run the LSTM over the pad tokens, which should be minimized by the BucketIterator
that is designed to reduce the amount of padding within a batch.
The padding_idx=pad_idx
argument to nn.Embedding
does something different. It means whenever a token that has been numericalized to pad_idx
is passed to the nn.Embedding
it will return a tensor of zeros at that position, see: https://stackoverflow.com/a/61173091
Thank you for your reply.
It means whenever a token that has been numericalized to pad_idx is passed to the nn.Embedding it will return a tensor of zeros at that position.
But why we need a tensor of zeros at the position of pad_idx
?Will the parts of pad that haven't been packed but still here affect the training of LSTM (as well as bidirectional=True
)?OR, what's the purpose of PACK
other than reducing training time?
Besides,can I use pack_padded_sequence
and pad_packed_sequence
instead of padding_idx=pad_idx
?
For varying-length sequences,Why don't have pack pad?