This repository is the implementation of MMQ
for the visual question answering task in medical domain. Our proposal achieves superior accuracy in comparison with other state-of-the-art (sota) methods on two public medical VQA datasets: PathVQA dataset and VQA-RAD dataset.
For the detail, please refer to link.
This repository is based on and inspired by @Binh D. Nguyen's work. We sincerely thank for their sharing of the codes.
PYTHON 3.6
CUDA 9.2
Please install dependence package by run following command:
pip install -r requirements.txt
Important: Before running any command lines, please run following command to access 'mmq_maml' folder:
$ cd mmq_maml
And now, you are in 'mmq_maml' folder.
data/pathvqa_maml/
directory.data/vqarad_maml/
directory.Train MAML models with MMQ on PathVQA dataset:
$ sh run_pathVQA.sh
Train MAML models with MMQ on VQA-RAD dataset:
$ sh run_VQA_RAD.sh
Important: For all VQA experiments, you should be in the 'root' folder.
All data should be downloaded via link. The downloaded file should be extracted to data_PathVQA/
directory.
All data should be downloaded via link. The downloaded file should be extracted to data_RAD/
directory.
MMQ results on PathVQA test set.
m | n | Free-form | Yes/No | Overall | |
---|---|---|---|---|---|
MAML | - | - | 5.8 | 79.5 | 42.9 |
MEVF | - | - | 8.1 | 81.4 | 47.1 |
MMQ | 5 | 3 | 13.4 | 84.0 | 48.8 |
MMQ + MEVF | 5 | 2 | 13.9 | 83.8 | 49.0 |
MMQ results on VQA-RAD test set.
m | n | Open-ended | Close-ended | Overall | |
---|---|---|---|---|---|
MAML | - | - | 40.1 | 72.4 | 59.6 |
MEVF | - | - | 43.9 | 75.1 | 62.7 |
MMQ | 5 | 3 | 53.7 | 75.8 | 67 |
MMQ + MEVF | 5 | 2 | 56.9 | 75.7 | 68.2 |
We have considered the recommendation of our reviewers about integrating MMQ into the MEVF. The setup further improves the overall performance in both PathVQA and VQA-RAD datasets. The number of parameters is only a 3% increase in comparison with our original MMQ. We please to provide the pre-trained weights of our state-of-the-art (SOTA) models in here .
Train MMQ + MEVF model with Bilinear Attention Network on PathVQA dataset.
$ sh run_vqa_PathVQA.sh
Train MMQ + MEVF model with Bilinear Attention Network on VQA-RAD dataset.
$ sh run_vqa_VQA_RAD.sh
For our SOTA model on PathVQA dataset MMQ_BAN_MEVF_pathVQA
. Please download the link and move to saved_models/MMQ_BAN_MEVF_pathVQA/
. The trained MMQ_BAN_MEVF_pathVQA
model can be tested in PathVQA test set via:
$ sh run_test_PathVQA.sh
For our SOTA model on VQA-RAD dataset MMQ_BAN_MEVF_vqaRAD
. Please download the link and move to saved_models/MMQ_BAN_MEVF_vqaRAD/
. The trained MMQ_BAN_MEVF_vqaRAD
model can be tested in VQA-RAD test set via:
$ sh run_test_VQA_RAD.sh
The result json file can be found in the directory results/
.
We also provides the pretrained meta-models and CDAE models for further investigation as belows:
data_RAD/maml/*.pth
is trained by using our MMQ source code.data_RAD/pretrained_ae.pth
.data_PathVQA/maml/*.pth
is trained by using our MMQ source code.data_PathVQA/pretrained_ae.pth
.If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:
@inproceedings{aioz_mmq_miccai21,
author={Tuong Do and Binh X. Nguyen and Erman Tjiputra and Minh Tran and Quang D. Tran and Anh Nguyen},
title={Multiple Meta-model Quantifying for Medical Visual Question Answering},
booktitle = {MICCAI},
year={2021}
}
If you find that the Mixture of Enhanced Visual Features (MEVF) model for MedVQA is useful, you could cite the following paper:
@inproceedings{aioz_mevf_miccai19,
author={Binh D. Nguyen, Thanh-Toan Do, Binh X. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran},
title={Overcoming Data Limitation in Medical Visual Question Answering},
booktitle = {MICCAI},
year={2019}
}
MIT License