Closed ctlaltdefeat closed 6 years ago
Sure! Good question.
Just out of curiosity, does our interface not work for you? It sounds like you should be able to just modify this file to your liking
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/modules/ImageEncoder.py
And then use the rest of the pipeline as is?
Also should mention this file gives an example of using the model as a library
http://opennmt.net/OpenNMT-py/Library.html
Think it should still work, let us know if there are issues. Something like this should work:
emb_size = 10
rnn_size = 6
# Specify the core model.
encoder_embeddings = onmt.modules.Embeddings(emb_size, len(vocab["src"]),
word_padding_idx=src_padding)
encoder = onmt.modules.RNNEncoder(hidden_size=rnn_size, num_layers=1,
rnn_type="LSTM", bidirectional=True,
embeddings=encoder_embeddings)
decoder_embeddings = onmt.modules.Embeddings(emb_size, len(vocab["tgt"]),
word_padding_idx=tgt_padding)
decoder = onmt.modules.InputFeedRNNDecoder(hidden_size=rnn_size, num_layers=1,
bidirectional_encoder=True,
rnn_type="LSTM", embeddings=decoder_embeddings)
model = onmt.modules.NMTModel(encoder, decoder)
# Specify the tgt word generator and loss computation module
model.generator = nn.Sequential(
nn.Linear(rnn_size, len(vocab["tgt"])),
nn.LogSoftmax())
loss = onmt.Loss.NMTLossCompute(model.generator, vocab["tgt"])
I've got the encoder like this:
class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.gru = nn.GRU(input_size=512, hidden_size=256, num_layers=2, batch_first=True, bidirectional=True)
self.conv1 = nn.Conv3d(in_channels=1, out_channels=128, kernel_size=(2, 3, 3), stride=2)
self.bn1 = nn.BatchNorm3d(128)
self.conv2 = nn.Conv3d(in_channels=128, out_channels=256, kernel_size=(2, 3, 3), stride=2)
self.bn2 = nn.BatchNorm3d(256)
self.conv3 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=(3, 3), stride=2)
self.bn3 = nn.BatchNorm2d(512)
self.conv4 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=2)
self.bn4 = nn.BatchNorm2d(512)
self.conv5 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=2)
self.bn5 = nn.BatchNorm2d(512)
self.fc1 = nn.Linear(7680, 512)
def forward(self, x, lengths=None):
x = F.leaky_relu(self.bn1(self.conv1(x)))
x = F.leaky_relu(self.bn2(self.conv2(x)))
lst = []
for i in x:
d = i.permute(1, 0, 2, 3)
d = F.leaky_relu(self.bn3(self.conv3(d)))
d = F.leaky_relu(self.bn4(self.conv4(d)))
d = F.leaky_relu(self.bn5(self.conv5(d)))
d = d.view(len(d), -1)
d = self.fc1(d)
lst.append(d)
output, hidden = self.gru(torch.stack(lst))
return hidden, output
I then have:
encoder = Encoder()
decoder_embeddings = onmt.modules.Embeddings(8, len(distinct_tokens),
word_padding_idx=-1)
decoder = onmt.modules.StdRNNDecoder(hidden_size=512, num_layers=2,
bidirectional_encoder=True,
rnn_type="GRU", embeddings=decoder_embeddings)
model = onmt.modules.NMTModel(encoder, decoder)
Does this look about right?
Trying to run model(src=Variable(torch.randn(1, 1, 120, 220, 150)), tgt=torch.LongTensor([1, 2, 3]).unsqueeze(1), lengths=None)
gives an error at line 306 of onmt's models.py
(as of current master), because tgt
doesn't have the correct dimensions. Indeed, similar to my question 2, I'm not sure what nfeat
means in this context. Why should tgt
not be [tgt_len x batch]
?
Oh, so you should unsqueeze one more dimension. tgt should be [tgt_len x batch x 1]
Thanks. I think the documentation may be conflicting, because forward
for NMTModel
expects [tgt_len x batch]
according to the docstring, but there is no unsqueezing.
Moving on, there seems to be a problem with embeddings. Simply running this code with the current master branch throws
RuntimeError: save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition
:
import torch
import onmt
emb = onmt.modules.Embeddings(5, 5, word_padding_idx=-1)
input = torch.LongTensor([1, 2, 3]).unsqueeze(1).unsqueeze(1)
emb(input)
Wrap in in (torch.autograd.)Variable and you should be good to go!
Right! Just testing your awareness...
I've got the NMTModel
now completing forward successfully.
I think I can now train the model using built-in pytorch functionality, but I'm still confused about how to beamsearch. The classes that deal with this seem to expect torchtext objects, which once again I'm not using.
Can you be more specific?
The Translator
and Translation
expect fields (dict of Fields): data fields
, which are presumably torchtext entities.
I'm not quite sure how to use them as I'm not using torchtext. This is partly because I am doing my own preprocessing on a custom target dataset and so don't see a need for torchtext, and partly because I don't find torchtext's documentation to be very clear.
If necessary, I may have to wrap my stuff with torchtext. On the face of it though, beam search is a mechanism agnostic of data domains, so design-wise perhaps it could be good to decouple it from torchtext.
closing this, lack of activity. reopen if needed.
Hello, despite scouring the docs for a while I'm having trouble understanding how to adapt the library to my needs.
In my application, my input is a sequence of images which I have already preprocessed. That is, I have a bunch of tensors of shape
[source_len, channels, height, width]
. The outputs are textual tokens, but I've already preprocessed everything, so that each output is of shape[target_len]
and starts and ends with special tokens (I can also one-hot encode them to[target_len, num_of_different_tokens]
if need be, asnum_of_different_tokens
is not large).I've built my own encoder to my liking for the image sequence, which applies a bunch of 3D (spatio-temporal) convolutions followed by an RNN. I'd now like to use a decoder with attention that uses the encoder's outputs (and, when training, target outputs to feed as inputs). Hopefully I'd like to train this end-to-end with OpenNMT's machinery, and decode at evaluation with a beam search etc. The main problem I'm facing is trying to disentangle the different input/outputs that the library focuses on (using torchtext etc...) and to apply just the pure seq2seq.
A couple of the specific issues I'm having so far:
StdRNNDecoder
(for example) requires specifying embeddings, and I haven't been able to wrap my head around theEmbeddings
object in this library or how to define it for my usecase. I don't think I technically need (the number of different tokens is rather small and I'm happy to simply use one-hot encoded vectors), but I'm fine defining it if I can figure out how.EncoderBase
object. For example, the forward iteration expectspadded sequences of sparse indices `[src_len x batch x nfeat]
. What does this mean here?NMTModel
, if necessary?I realise this is a rather open-ended question, but I would appreciate assistance if possible.