aioz-ai / MICCAI21_MMQ

Multiple Meta-model Quantifying for Medical Visual Question Answering (MICCAI 2021)
https://blog.ai.aioz.io/research/vqa-mmq/
MIT License
35 stars 13 forks source link
deep-learning medical medical-image-processing question-answering vqa

Multiple Meta-model Quantifying for Medical Visual Question Answering (MMQ-VQA)

This repository is the implementation of MMQ for the visual question answering task in medical domain. Our proposal achieves superior accuracy in comparison with other state-of-the-art (sota) methods on two public medical VQA datasets: PathVQA dataset and VQA-RAD dataset.

For the detail, please refer to link.

This repository is based on and inspired by @Binh D. Nguyen's work. We sincerely thank for their sharing of the codes.

Prerequisites

PYTHON 3.6

CUDA 9.2

Please install dependence package by run following command:

pip install -r requirements.txt

MMQ Progress

Fig-1

Figure 1: Multiple Meta-model Quantifying in medical VQA. Dotted lines denote looping steps, the number of loop equals to $m$ required meta-models.

Preprocessing

Important: Before running any command lines, please run following command to access 'mmq_maml' folder:

$ cd mmq_maml

And now, you are in 'mmq_maml' folder.

Traing MAML models with MMQ

Train MAML models with MMQ on PathVQA dataset:

$ sh run_pathVQA.sh

Train MAML models with MMQ on VQA-RAD dataset:

$ sh run_VQA_RAD.sh

VQA Progress

Fig-2

Figure 2: Our VQA framework is designed to integrate robust image features extracted from multiple meta-models outputted from MMQ.

Important: For all VQA experiments, you should be in the 'root' folder.

Preprocessing

PathVQA dataset for VQA task

All data should be downloaded via link. The downloaded file should be extracted to data_PathVQA/ directory.

VQA-RAD dataset for VQA task

All data should be downloaded via link. The downloaded file should be extracted to data_RAD/ directory.

Experimental results

MMQ results on PathVQA test set.

m n Free-form Yes/No Overall
MAML - - 5.8 79.5 42.9
MEVF - - 8.1 81.4 47.1
MMQ 5 3 13.4 84.0 48.8
MMQ + MEVF 5 2 13.9 83.8 49.0

MMQ results on VQA-RAD test set.

m n Open-ended Close-ended Overall
MAML - - 40.1 72.4 59.6
MEVF - - 43.9 75.1 62.7
MMQ 5 3 53.7 75.8 67
MMQ + MEVF 5 2 56.9 75.7 68.2

We have considered the recommendation of our reviewers about integrating MMQ into the MEVF. The setup further improves the overall performance in both PathVQA and VQA-RAD datasets. The number of parameters is only a 3% increase in comparison with our original MMQ. We please to provide the pre-trained weights of our state-of-the-art (SOTA) models in here .

Training

Train MMQ + MEVF model with Bilinear Attention Network on PathVQA dataset.

$ sh run_vqa_PathVQA.sh

Train MMQ + MEVF model with Bilinear Attention Network on VQA-RAD dataset.

$ sh run_vqa_VQA_RAD.sh

Pretrained models and Testing

For our SOTA model on PathVQA dataset MMQ_BAN_MEVF_pathVQA. Please download the link and move to saved_models/MMQ_BAN_MEVF_pathVQA/. The trained MMQ_BAN_MEVF_pathVQA model can be tested in PathVQA test set via:

$ sh run_test_PathVQA.sh

For our SOTA model on VQA-RAD dataset MMQ_BAN_MEVF_vqaRAD. Please download the link and move to saved_models/MMQ_BAN_MEVF_vqaRAD/. The trained MMQ_BAN_MEVF_vqaRAD model can be tested in VQA-RAD test set via:

$ sh run_test_VQA_RAD.sh

The result json file can be found in the directory results/.

We also provides the pretrained meta-models and CDAE models for further investigation as belows:

Citation

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{aioz_mmq_miccai21,
  author={Tuong Do and Binh X. Nguyen and Erman Tjiputra and Minh Tran and Quang D. Tran and Anh Nguyen},
  title={Multiple Meta-model Quantifying for Medical Visual Question Answering},
  booktitle = {MICCAI},
  year={2021}
}

If you find that the Mixture of Enhanced Visual Features (MEVF) model for MedVQA is useful, you could cite the following paper:

@inproceedings{aioz_mevf_miccai19,
  author={Binh D. Nguyen, Thanh-Toan Do, Binh X. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran},
  title={Overcoming Data Limitation in Medical Visual Question Answering},
  booktitle = {MICCAI},
  year={2019}
}

License

MIT License