Implementation of CVPR2017 paper: Dense captioning with joint inference and visual context by Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
WITH CHANGES:
Beam Search
and Length Normalization
in test mode.Special thanks to valohai for offering computing resource.
Update 2017.12.31
Update 2017.12.20
To install required python modules by:
pip install -r lib/requirements.txt
For evaluation, one also need:
To install java runtime by:
sudo apt-get install openjdk-8-jre
Website of Visual Genome Dataset
VG
wherever you like.images
Part1 and Part2, extract all (two parts)
to directory VG/images
image meta data
, extract to directory VG/1.2
or VG/1.0
according to the version you download.region descriptions
, extract to directory VG/1.2
or VG/1.0
accordingly.VG
as raw_data_path
, e.g. /home/user/git/VG
.If one has RAM more than 16G, then you can preprocessing dataset with following command.
$ cd $ROOT/lib
$ python preprocess.py --version [version] --path [raw_data_path] \
--output_dir [dir] --max_words [max_len]
If one has RAM less than 16G
.
info/read_regions.py
accordingly, and run the script with python. Then it will dump regions
in REGION_JSON
directory. It will take time to process more than 100k images, so be patient.
$ cd $ROOT/info
$ python read_regions --version [version] --vg_path [raw_data_path]
lib/preprocess.py
, set up data path accordingly. After running the file, it will dump gt_regions
of every image respectively to OUTPUT_DIR
as directory
.
$ cd $ROOT/lib
$ python preprocess.py --version [version] --path [raw_data_path] \
--output_dir [dir] --max_words [max_len] --limit_ram
$ cd root/lib
$ make
Add or modify configurations in root/scripts/dense_cap_config.yml
, refer to 'lib/config.py' for more configuration details.
$ cd $ROOT
$ bash scripts/dense_cap_train.sh [dataset] [net] [ckpt_to_init] [data_dir] [step]
Parameters:
visual_genome_1.2
or visual_genome_1.0
.prepare data
.Create a directory data/demo
$ mkdir $ROOT/data/demo
Then put the images to be tested in the directory.
Download pretrained model (iters 500k) by Google Drive
or Jbox. Then create a "output"
directory under $ROOT
$ mkdir $ROOT/output
Extract the downloaded "ckpt.zip" to directory $ROOT/output
.
And run
$ cd $ROOT
$ bash scripts/dense_cap_demo.sh ./output/ckpt ./output/ckpt/vocabulary.txt
or run
$ bash scripts/dense_cap_demo.sh [ckpt_path] [vocab_path]
for your customized checkpoint directory.
It will create html files in $ROOT/demo
, just click it.
Or you can use the web-based visualizer created by karpathy by
$ cd $ROOT/vis
$ python -m SimpleHTTPServer 8181
Then point your web brower to http://localhost:8181/view_results.html.