VLCI
This is the implementation of Cross-Modal Causal Intervention for Medical Report Generation.
It contains the codes of the Visual-Linguistic Pre-training (VLP), and fine-tuning via Visual-Linguistic Causal Intervention (VLCI) on IU-Xray/MIMIC-CXR dataset.
Requirements
All the requirements are listed in the requirements.yaml file. Please use this command to create a new environment and activate it.
conda env create -f requirements.yaml
conda activate mrg
Preparation
- Datasets:
You can download the dataset via
data/datadownloader.py
, or download from the repo of R2Gen.
Then, unzip the files into data/iu_xray
and data/mimic_cxr
, respectively.
- Models: We provide the well-trained models of VLCI for inference, and you can download from here.
- Please remember to change the path of data and models in the config file (
config/*.json
).
Evaluation
- For VLCI on IU-Xray dataset
python main.py -c config/iu_xray/vlci.json
| Model | B@1 | B@2 | B@3 | B@4 |C | R| M |
|:-----: |:---: |:---: |:---: |:---: |:---: |:---:|:---: |
| R2Gen | 0.470 | 0.304 | 0.219 | 0.165 |/ |0.371|0.187 |
| CMCL | 0.473 | 0.305 | 0.217 | 0.162 |/ |0.378|0.186 |
| PPKED | 0.483 | 0.315 | 0.224 | 0.168 | 0.351 |0.376|0.190 |
| CA | 0.492 | 0.314 | 0.222 | 0.169 |/ |0.381|0.193 |
| AlignTransformer | 0.484 | 0.313 | 0.225 | 0.173 |/ |0.379|0.204 |
| M2TR | 0.486 | 0.317 | 0.232 | 0.173 |/ |0.390|0.192 |
| MGSK | 0.496 | 0.327 | 0.238 | 0.178 |0.382 |0.381|/ |
| RAMT | 0.482 | 0.310 | 0.221 | 0.165 |/ |0.377|0.195 |
| MMTN | 0.486 | 0.321 | 0.232 | 0.175 |0.361 |0.375|/ |
| DCL | / | / | / | 0.163 |**0.586** |0.383|0.193 |
| VLCI | **0.505** | **0.334** | **0.245** | **0.189** |0.456 |**0.397**|**0.204** |
- For VLCI on MIMIC-CXR dataset
python main.py -c config/mimic_cxr/vlci.json
| Model | B@1 | B@2 | B@3 | B@4 |C | R| M | CE-P | CE-R | CE-F1 |
|:-----: |:---: |:---: |:---: |:---: |:---:|:---:|:---: |:---: |:---: |:---: |
| R2Gen | 0.353 | 0.218 | 0.145 | 0.103 |/ |0.277|0.142 | 0.333 | 0.273 | 0.276 |
| CMCL | 0.334 | 0.217 | 0.140 | 0.097 |/ |0.281|0.133 | / | / | / |
| PPKED | 0.360 | 0.224 | 0.149 | 0.106 |0.237|**0.284**|0.149 | / | / | / |
| CA | 0.350 | 0.219 | 0.152 | 0.109 |/ |0.283|0.151 | 0.352 | 0.298 | 0.303 |
| AlignTransformer | 0.378 | 0.235 | 0.156 | 0.112 |/ |0.283|0.158 | / | / | / |
| M2TR | 0.378 | 0.232 | 0.154 | 0.107 |/ |0.272|0.145 | 0.240 | 0.428 | 0.308 |
| MGSK | 0.363 | 0.228 | 0.156 | 0.115 |0.203|**0.284**|/ | 0.458 | 0.348 | 0.371 |
| RAMT | 0.362 | 0.229 | 0.157 | 0.113 |/ |**0.284**|0.153 | 0.380 | 0.342 | 0.335 |
| MMTN | 0.379 | 0.238 | 0.159 | 0.116 |/ |0.283|**0.161** | / | / | / |
| DCL | / | / | / | 0.109 |**0.281**|**0.284**|0.150 | 0.471 | 0.352 |0.373 |
| VLCI | **0.400** | **0.245** | **0.165** | **0.119** | 0.190 | 0.280 | 0.150 | **0.489** | **0.340** | **0.401** |
Citation
If you use this code for your research, please cite our paper.
@misc{chen2023crossmodal,
title={Cross-Modal Causal Intervention for Medical Report Generation},
author={Weixing Chen and Yang Liu and Ce Wang and Jiarui Zhu and Guanbin Li and Liang Lin},
year={2023},
eprint={2303.09117},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Contact
If you have any question about this code, feel free to reach me (chen867820261@gmail.com)
Acknowledges
We thank R2Gen for their open source works.