facebookresearch / clevr-dataset-gen

A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Other
577 stars 204 forks source link

Training GQNs on CLEVR dataset #15

Open loganbruns opened 5 years ago

loganbruns commented 5 years ago

I have done some limited experiments with training Generative Query Networks with CLEVR dataset so I can experiment with using them instead of RESNET based embeddings for CLEVR VQA.

I modified the CLEVR dataset generator to create additional images and metadata to allow GQNs to be trained on the CLEVR dataset domain. Namely to create multiple views of the same scene from different perspectives using a camera moving along a ring. Also preserving camera pose so it can be used both for GQN training and for the original image for generating embeddings for the CLEVR image.

Here as an example during training where for GQNs the objective is to train the model to predict a new view, previously unseen, after being given multiple contexts from different angles.

Screen Shot 2019-06-11 at 6 04 24 AM

You can see even with limited training time it does a decent job of predicting the new view even with mostly accurate shadows. Below is a test time example.

Screen Shot 2019-06-16 at 1 41 00 PM

I saw some promising preliminary results using the baseline models in clevr-iep but I also think that this might be an interesting area for others to investigate too. At least the intuition is that neural scene representations could improve scene understanding.

Before I clean up my code for a pull request I was wondering if there might be interest in a pull request? Below is my branch code that I would generalize and clean up.

https://github.com/facebookresearch/clevr-dataset-gen/compare/master...loganbruns:clevr_gqn

Thanks, logan