Training GQNs on CLEVR dataset

I have done some limited experiments with training Generative Query Networks with CLEVR dataset so I can experiment with using them instead of RESNET based embeddings for CLEVR VQA.

I modified the CLEVR dataset generator to create additional images and metadata to allow GQNs to be trained on the CLEVR dataset domain. Namely to create multiple views of the same scene from different perspectives using a camera moving along a ring. Also preserving camera pose so it can be used both for GQN training and for the original image for generating embeddings for the CLEVR image.

Here as an example during training where for GQNs the objective is to train the model to predict a new view, previously unseen, after being given multiple contexts from different angles.

You can see even with limited training time it does a decent job of predicting the new view even with mostly accurate shadows. Below is a test time example.

I saw some promising preliminary results using the baseline models in clevr-iep but I also think that this might be an interesting area for others to investigate too. At least the intuition is that neural scene representations could improve scene understanding.

Before I clean up my code for a pull request I was wondering if there might be interest in a pull request? Below is my branch code that I would generalize and clean up.

https://github.com/facebookresearch/clevr-dataset-gen/compare/master...loganbruns:clevr_gqn

Thanks, logan

facebookresearch / clevr-dataset-gen

Training GQNs on CLEVR dataset #15