multimodal: initialize hidden state of encoder + transformer

Eurus-Holmes / MNMT

Pytorch implementation of Multimodal Neural Machine Translation(MNMT).

https://chenfeiyang.top/MNMT/

MIT License

12 stars 0 forks source link

multimodal: initialize hidden state of encoder + transformer #9

Open LinuxBeginner opened 4 years ago

LinuxBeginner commented 4 years ago

Hi, could you please tell me how using the Image as additional data to initialise the encoder hidden states (Calixto et al., 2017) take place when implemented with the transformer model?

python train_mm.py -data dataset/bpe -save_model model/IMGE_ADAM -gpuid 0 -path_to_train_img_feats image_feat/train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats image_feat/valid_vgg19_bn_cnn_features.hdf5 -enc_layers 6 -dec_layers 6 -encoder_type transformer -decoder_type transformer -position_encoding -epochs 300 -dropout 0.1 -batch_size 128 -batch_type tokens -optim adam -learning_rate 0.01 -gpuid 0 --multimodal_model_type imge

On running the above command, does the system ignore the image features and train the system with text only transformer model? @Eurus-Holmes

Eurus-Holmes commented 4 years ago

@LinuxBeginner Please refer to README.md.

To train a multi-modal NMT model, use the train_mm.py script. In addition to the parameters accepted by the standard train.py (that trains a text-only NMT model), this script expects the path to the training and validation image features, as well as the multi-modal model type (one of imgd, imge, imgw, or src+img).

LinuxBeginner commented 4 years ago

@Eurus-Holmes Thank you for the response. I get what you are implying. It's just that, in Calixto et al., 2017, the author only talks about the attention mechanism of (Bahdanau et al., 2014) and not the transformer model.

We tried the transformer model with text only ( train.py script) and also with multimodal (train_mm.py script). But, there was no improvement in the result and the BLEU scores are almost same.

So, I was under the assumption that, even if I use the train_mm.py script along with one of the multi-modal model type to train transformer model (as mentioned in my question), train_mm.py script ignores the multimodal approach and simply train with the text only version.

Our goal is to train the multi-modal NMT (transformer) model.

Please correct me if I am wrong.