This repository contains the PyTorch implementation of our MVP model in the paper: "Multi-View Feature Fusion and Visual Prompt for Remote Sensing Image Captioning".
For more information, please see our early access paper in IEEE (Accepted by TGRS 2024)
The pretrained CLIP(ViT-B/16 version) model can be downloaded from here.
See details in data/README.md
sh train.sh
--id
in train.sh
refers to the model name.
The train script will dump checkpoints into the folder specified by --checkpoint_path
.
For all the arguments, you can specify them in a yaml file and use --cfg
to use the configurations in that yaml file.
sh test.sh
--id
in test.sh
also means the model name, which is the same in train.sh
.
If you find this repo useful, please consider citing:
@article{wang2024multi,
title={Multi-View Feature Fusion and Visual Prompt for Remote Sensing Image Captioning},
author={Wang, Shuang and Lin, Qiaoling and Ye, Xiutiao and Liao, Yu and Quan, Dou and Jin, Zhongqian and Hou, Biao and Jiao, Licheng},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2024},
publisher={IEEE}
}
Our code is based on the codebase repo of ruotianluo. If you want to replicate the work, maybe carefully follow the codebase repo first.
Thank the excellent contributors ruotianluo team and jianjieluo team.