In lines 404 to 415 of the model.py file, why do you want to add the logit of the image and the logit of the hiddien state as the final logit? Why not directly multiply the image features and the hidden state as the final logit?
Why don't you convert the weighted image features into the state c of the cell, and concatenate it with word embedding as input?
In lines 404 to 415 of the model.py file, why do you want to add the logit of the image and the logit of the hiddien state as the final logit? Why not directly multiply the image features and the hidden state as the final logit?![qq 20180919225427](https://user-images.githubusercontent.com/22372454/45761526-0402af80-bc5f-11e8-881d-6e50718b5541.png)
Why don't you convert the weighted image features into the state c of the cell, and concatenate it with word embedding as input?![qq 20180920001517](https://user-images.githubusercontent.com/22372454/45766474-4382c900-bc6a-11e8-8dcc-391756a8d28a.png)