InnerPeace-Wu / densecap-tensorflow

Re-implement CVPR2017 paper: "dense captioning with joint inference and visual context" and minor changes in Tensorflow. (mAP 8.296 after 500k iters of training)
MIT License
61 stars 28 forks source link

Densecap-tensorflow

Implementation of CVPR2017 paper: Dense captioning with joint inference and visual context by Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li

WITH CHANGES:

  1. Borrow the idea of Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling, and tied word vectors and word classfiers during captioning.
  2. Initialize Word Vectors and Word Classifers with pre-trained glove word vectors with dimensions of 300.
  3. Change the backbone of the framework to ResNet-50.
  4. Add Beam Search and Length Normalization in test mode.
  5. Add "Limit_RAM" mode when praparing training date since my computer only has RAM with 8G.

Special thanks to valohai for offering computing resource.

Note

Update 2017.12.31

Update 2017.12.20

Dependencies

To install required python modules by:

pip install -r lib/requirements.txt

For evaluation, one also need:

To install java runtime by:

sudo apt-get install openjdk-8-jre

Preparing data

Download

Website of Visual Genome Dataset

Unlimit RAM

If one has RAM more than 16G, then you can preprocessing dataset with following command.

$ cd $ROOT/lib
$ python preprocess.py --version [version] --path [raw_data_path] \
        --output_dir [dir] --max_words [max_len]

Limit RAM (Less than 16G)

If one has RAM less than 16G.

Compile local libs

$ cd root/lib
$ make

Train

Add or modify configurations in root/scripts/dense_cap_config.yml, refer to 'lib/config.py' for more configuration details.

$ cd $ROOT
$ bash scripts/dense_cap_train.sh [dataset] [net] [ckpt_to_init] [data_dir] [step]

Parameters:

Demo

Create a directory data/demo

$ mkdir $ROOT/data/demo

Then put the images to be tested in the directory.

Download pretrained model (iters 500k) by Google Drive or Jbox. Then create a "output" directory under $ROOT

$ mkdir $ROOT/output

Extract the downloaded "ckpt.zip" to directory $ROOT/output. And run

$ cd $ROOT
$ bash scripts/dense_cap_demo.sh ./output/ckpt ./output/ckpt/vocabulary.txt

or run

$ bash scripts/dense_cap_demo.sh [ckpt_path] [vocab_path]

for your customized checkpoint directory.

It will create html files in $ROOT/demo, just click it. Or you can use the web-based visualizer created by karpathy by

$ cd $ROOT/vis
$ python -m SimpleHTTPServer 8181

Then point your web brower to http://localhost:8181/view_results.html.

TODO:

References