Open AndreiBarsan opened 7 years ago
Okay, I'll try to get this going, with a basic VGGnet or something at first.
I will skip this and jump straight into a fancier model, so that we can make sure we're also covering a novel technique not covered in the lecture.
A simple baseline is described here: https://arxiv.org/pdf/1512.02167.pdf (Simple Baseline for Visual Question Answering)
We have almost all of this already, except for the image CNN. The images are now just mapped directly to pre-computed feature vectors, so it would be nice to add the CNN "arm" of the neural network in order to have a "proper" baseline.