VLCI

This is the implementation of Cross-Modal Causal Intervention for Medical Report Generation. It contains the codes of the Visual-Linguistic Pre-training (VLP), and fine-tuning via Visual-Linguistic Causal Intervention (VLCI) on IU-Xray/MIMIC-CXR dataset.

Requirements

All the requirements are listed in the requirements.yaml file. Please use this command to create a new environment and activate it.

conda env create -f requirements.yaml
conda activate mrg

Preparation

Datasets: You can download the dataset via data/datadownloader.py, or download from the repo of R2Gen. Then, unzip the files into data/iu_xray and data/mimic_cxr, respectively.
Models: We provide the well-trained models of VLCI for inference, and you can download from here.
Please remember to change the path of data and models in the config file (config/*.json).

Evaluation

For VLCI on IU-Xray dataset

python main.py -c config/iu_xray/vlci.json

| Model | B@1 | B@2 | B@3 | B@4 |C | R| M | |:-----: |:---: |:---: |:---: |:---: |:---: |:---:|:---: | | R2Gen | 0.470 | 0.304 | 0.219 | 0.165 |/ |0.371|0.187 | | CMCL | 0.473 | 0.305 | 0.217 | 0.162 |/ |0.378|0.186 | | PPKED | 0.483 | 0.315 | 0.224 | 0.168 | 0.351 |0.376|0.190 | | CA | 0.492 | 0.314 | 0.222 | 0.169 |/ |0.381|0.193 | | AlignTransformer | 0.484 | 0.313 | 0.225 | 0.173 |/ |0.379|0.204 | | M2TR | 0.486 | 0.317 | 0.232 | 0.173 |/ |0.390|0.192 | | MGSK | 0.496 | 0.327 | 0.238 | 0.178 |0.382 |0.381|/ | | RAMT | 0.482 | 0.310 | 0.221 | 0.165 |/ |0.377|0.195 | | MMTN | 0.486 | 0.321 | 0.232 | 0.175 |0.361 |0.375|/ | | DCL | / | / | / | 0.163 |**0.586** |0.383|0.193 | | VLCI | **0.505** | **0.334** | **0.245** | **0.189** |0.456 |**0.397**|**0.204** |

For VLCI on MIMIC-CXR dataset

python main.py -c config/mimic_cxr/vlci.json

| Model | B@1 | B@2 | B@3 | B@4 |C | R| M | CE-P | CE-R | CE-F1 | |:-----: |:---: |:---: |:---: |:---: |:---:|:---:|:---: |:---: |:---: |:---: | | R2Gen | 0.353 | 0.218 | 0.145 | 0.103 |/ |0.277|0.142 | 0.333 | 0.273 | 0.276 | | CMCL | 0.334 | 0.217 | 0.140 | 0.097 |/ |0.281|0.133 | / | / | / | | PPKED | 0.360 | 0.224 | 0.149 | 0.106 |0.237|**0.284**|0.149 | / | / | / | | CA | 0.350 | 0.219 | 0.152 | 0.109 |/ |0.283|0.151 | 0.352 | 0.298 | 0.303 | | AlignTransformer | 0.378 | 0.235 | 0.156 | 0.112 |/ |0.283|0.158 | / | / | / | | M2TR | 0.378 | 0.232 | 0.154 | 0.107 |/ |0.272|0.145 | 0.240 | 0.428 | 0.308 | | MGSK | 0.363 | 0.228 | 0.156 | 0.115 |0.203|**0.284**|/ | 0.458 | 0.348 | 0.371 | | RAMT | 0.362 | 0.229 | 0.157 | 0.113 |/ |**0.284**|0.153 | 0.380 | 0.342 | 0.335 | | MMTN | 0.379 | 0.238 | 0.159 | 0.116 |/ |0.283|**0.161** | / | / | / | | DCL | / | / | / | 0.109 |**0.281**|**0.284**|0.150 | 0.471 | 0.352 |0.373 | | VLCI | **0.400** | **0.245** | **0.165** | **0.119** | 0.190 | 0.280 | 0.150 | **0.489** | **0.340** | **0.401** |

Citation

If you use this code for your research, please cite our paper.

@misc{chen2023crossmodal,
      title={Cross-Modal Causal Intervention for Medical Report Generation}, 
      author={Weixing Chen and Yang Liu and Ce Wang and Jiarui Zhu and Guanbin Li and Liang Lin},
      year={2023},
      eprint={2303.09117},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

If you have any question about this code, feel free to reach me (chen867820261@gmail.com)

Acknowledges

We thank R2Gen for their open source works.

WissingChen / VLCI

readme