AndreiBarsan / visualqa

Visual Question Answering project for ETHZ's 2016/17 Deep Learning class.
Apache License 2.0
5 stars 2 forks source link

Baseline implementation #1

Closed AndreiBarsan closed 7 years ago

AndreiBarsan commented 7 years ago

A simple baseline is described here: https://arxiv.org/pdf/1512.02167.pdf (Simple Baseline for Visual Question Answering)

It uses a bag-of-words approach to model the questions, and a relatively simple CNN for processing the image. It then uses a softmax to pick an answer, i.e. does no actual language synthesis.

AndreiBarsan commented 7 years ago

This is more or less in place; the current baseline is even easier: it uses pre-computed, untrainable 4096-dimensional representations of the actual images.

I'll close this, and readd the CNN part as a separate task.