This repository contains a PyTorch implementation of the paper:
Learning Object-Centric Representations of Multi-object Scenes from Multiple Views
Li Nanbo,
Cian Eastwood,
Robert B. Fisher
NeurIPS 2020 (Spotlight)
Check our video presentation for more: https://youtu.be/Og2ic2L77Pw.
Hardware:
Python Environement:
We use Anaconda to manage our python environment. Check conda installation guide here: https://docs.anaconda.com/anaconda/install/linux/.
Open a new terminal, direct to the MulMON directory:
cd <YOUR-PATH-TO-MulMON>/MulMON/
create a new conda environment called "mulmon" and then activate it:
conda env create -f ./conda-env-spec.yml
conda activate mulmon
Install a gpu-supported PyTorch (tested with PyTorch 1.1, 1.2 and 1.7). It is very likely that there exists a PyTorch installer that is compatible with both your CUDA and this code. Go find it on PyTorch official site, and install it with one line of command.
Install additional packages:
pip install tensorboard
pip install scikit-image
If pytorch <=1.2 is used, you will also need to execute: pip install tensorboardX
and import it in the ./trainer/base_trainer.py
file. This can be done by commenting the 4th line AND uncommenting the 5th line of that file.
Data structure (important):
We use a data structure as follows:
<YOUR-PATH>
├── ...
└── mulmon_datasets
├── clevr # place your own CLEVR-MV under this directory if you go the fun way
│ ├── ...
│ ├── clevr_mv
│ │ └── ... (omit) # see clevr_<xxx> for subdirectory details
│ ├── clevr_aug
│ │ └── ... (omit) # see clevr_<xxx> for subdirectory details
│ └── clevr_<xxx>
│ ├── ...
│ ├── data # contains a list of scene files
│ │ ├── CLEVR_new_#.npy # one .npy --> one scene sample
│ │ ├── CLEVR_new_#.npy
│ │ └── ...
│ ├── clevr_<xxx>_train.json # meta information of the training scenes
│ └── clevr_<xxx>_test.json # meta information of the testing scenes
└── GQN
├── ...
└── gqn-jaco
├── gqn_jaco_train.h5
└── gqn_jaco_test.h5
We recommend one to get the necessary data folders ready before downloading/generating the data files:
mkdir <YOUR-PATH>/mulmon_datasets
mkdir <YOUR-PATH>/mulmon_datasets/clevr
mkdir <YOUR-PATH>/mulmon_datasets/GQN
Get Datasets
<YOUR-PATH>/mulmon_datasets/clevr/
directory (~1.8GB when extracted). <YOUR-PATH>/mulmon_datasets/clevr/
directory (~3.8GB when extracted). <YOUR-PATH>/mulmon_datasets/GQN/
directory (~3.2GB when extracted). and extract them in places. For example, the command for extracting clevr_mv.tar.gz
:
tar -zxvf <YOUR-PATH>/mulmon_datasets/clevr/clevr_mv.tar.gz -C <YOUR-PATH>/mulmon_datasets/clevr/
Note that: 1) we used only a subset of the DeepMind GQN-Jaco dataset, more available at deepmind/gqn-datasets, and 2) the published clevr_aug dataset differs slightly from the CLE-Aug used in the paper---we added more shapes (such as dolphins) into the dataset to make the dataset more interesting (also more complex).
Download the pretrained models (← click) and place it under `MulMON/', i.e. the root directory of this repository, then extract it by executing: tar -zxvf ./logs.tar.gz
. Note that some of them are slightly under-trained, so one could train them further to achieve better results (How to train?).
Configure data path
To run the code, the data path, i.e. the <YOUR-PATH>
in a script, needs to be correctly configured. For example, we store the MulMON dataset folder mulmon_datasets
in ../myDatasets/
, to train a MulMON on GQN-Jaco dataset using a single GPU, the 4th line of the ./scripts/train_jaco.sh
script should look like:
data_path=../myDatasets/mulmon_datasets/GQN
.
Demo (Environment Test)
Before running the below code, make sure the pretrained models are downloaded and saved first:
. scripts/demo.sh
Check ./logs
folder for the generated demos.
Train
. scripts/train_jaco.sh
. scripts/train_jaco_parallel.sh
checkpoint-epoch<#number>.pth
, simply append a flag --resume_epoch <#number>
to one of the flags in the script files.checkpoint-epoch2000.pth
) on GQN-Jaco data, we just need to reconfigure the 10th line of the ./scripts/train_jaco.sh
as:--input_dir ${data_path} --output_dir ${log_path} --resume_epoch 2000 \
.Evaluation
. scripts/eval_clevr.sh
--resume_epoch
specify a model to evaluate
--test_batch
how many batches of test data one uses for evaluation.--vis_batch
how many batches of output one visualises (save) while evaluation. (note: <= --test_batch
)--analyse_batch
how many batches of latent codes one saves for a post analysis, e.g. disentanglement. (note: <= --test_batch
)--eval_all
(boolean) set True for all [--eval_recon
, --eval_seg
, --eval_qry_obs
, --eval_qry_seg
] items, one could also use each of the four independently.--eval_dist
(boolean) save latent codes for disentanglement analysis. (note: not controlled by --eval_all
) scripts/eval_clevr.sh
script with --eval_dist
flag set to True and set the --analyse_batch
variable (which controls how many scenes of latent codes one wants to analyse) to be greater than 0. This saves the ouptut latent codes and ground-truth information that allows you to conduct disentanglement quantification using the QEDR framework. We constantly respond to the raised ''issues'' in terms of running the code. For further inquiries and discussions (e.g. questions about the paper), email: nanbo.li@ed.ac.uk.
Please cite our paper if you find this code useful.
@inproceedings{nanbo2020mulmon,
title={Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views},
author={Nanbo, Li and Eastwood, Cian and Fisher, Robert B},
booktitle={Advances in Neural Information Processing Systems},
year={2020}
}