This repository contains code for the following paper:
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B. and Darrell, T., 2016. Generating Visual Explanations. ECCV 2016.
@article{hendricks2016generating,
title={Generating Visual Explanations},
author={Hendricks, Lisa Anne and Akata, Zeynep and Rohrbach, Marcus and Donahue, Jeff and Schiele, Bernt and Darrell, Trevor},
journal={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2016}
}
This code has been edited extensively (you can see old code on deprecated branch). Hopefully it is easier to use, but please bug me if you run into issues.
Additionally, if you are interested in a pytorch version of this code, you can look at Stephan Alaniz's implementation here. Thanks Stephan!
All the models are generated using NetSpec. Please build them by running "build_nets.sh". "build_nets.sh" will also generate bash scripts you can use to train models.
If you would like to retrain my models, please use the following instructions. Note that all my trained models are in "gve_models". All the training scripts will be built using "build_nets.sh"
Please use the bash scripts eval_*.sh to compute image relevance metrics. To compute class relevance metrics, run "analyze_cider_scores.py". This relies on precomputed CIDEr scores between each generated sentence and reference sentences from each class. You can recompute these using "class_meteor_similarity_metric.py" but this will take > 10 hours.
Please note that I retrained the models since the initial arXiv version of the paper was released so the numbers are slightly different, though the main trends remain the same.