ConceptBert

This repository is the implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering.

Original paper: François Gardères, Maryam Ziaeefard, Baptiste Abeloos, Freddy Lécué: ConceptBert: Concept-Aware Representation for Visual Question Answering. EMNLP (Findings) 2020: 489-498 https://aclanthology.org/2020.findings-emnlp.44.pdf

For an overview of the pipleline, please refere to the following picture:

Pipeline

License

This work is dual-licensed under the Thales Digital Solutions Canada license and MIT License.

The main license is the Thales Digital Solutions Canada one. You can find the license file here.
This repository is based on and inspired by Facebook research (vilbert-multi-task). We sincerely thank for their sharing of the codes. The code related to vilbert-multi-task is licensed by the MIT License, please for more information refer to the file.

Pre-requisite

python 3.6.12
docker environment

Disclaimer

Currently, the project requires a lot of resources to be able to run correctly.

It is necessary to count at least 6 days of training for the first training with a GTX 1080 Ti(11Go RAM), and 17hours in an Kubernetes environment with 7GPU (7 Titan-v(32Go)). All the pipelines were tested on GPU server with four GeForce RTX 2080 Ti (12Go)

:electric_plug: Data

ℹ️ Notes:

All information regarding the datasets or models used is specified in the original paper.

The original validation file and the pre-trained model are available on the kaggle of the project: https://www.kaggle.com/thalesgroup/conceptbert/

Our implementation uses the pretrained features from bottom-up-attention, 100 fixed features per image and the GloVe vectors. The data might be saved in a folder along with pretrained_models and organized as shown below:

vilbert
├── data2
│   ├── coco (visual features)
│   ├── conceptnet (conceptnet facts)
│   ├── conceptual_captions (captions for each image, extracted from (https://github.com/google-research-datasets/conceptual-captions))
│   ├── kilbert_base_model (pre-trained weights for initial conceptBert model)
│   ├── OK-VQA (OK-VQA dataset)
│   ├── save_final (final saved models and outputs)
│   ├── tensorboards (location to save tensorboard files)
│   ├── VQA (VQA dataset)
│   ├── VQA_bert_base_6layer_6conect-pretrained (pre-trained weights for initial vilbert model trained on vqa)

The model checkpoints will be saved in the output : ./outputs/

:whale2: Docker installation (recommended)

You can choose to run ConceptBert with Docker or from your environment

Build

  docker build -t conceptbert .

Start the container

  docker run -it -v /path/to/you/nas/:/nas-data/ conceptbert:latest bash

Additional parameters

  docker run -it -v --shm-size=10g -e CUDA_VISIBLE_DEVICES=0,1,2,3 -v /path/to/you/nas/:/nas-data/ conceptbert:latest bash

--shm-size is used to prevent Shared Memory error. Here the value is 10Go (refer docker documentation)
-e CUDA_VISIBLE_DEVICES is used to use specific GPU available. Here we want to use 4 GPU.

When the container is up, go to the section 1. Train with VQA

Other installation

You can use the requirements.txt file to install the dependencies of the project.

Pre-requisite:

Compile the tools cd conceptBert/tools/refer && make
python 3.6.x

If you have difficulties to create your environment, look at the contents of the Dockerfile for the necessary dependencies that you might miss.

:rocket: Training and Validation

Note: models and json used in the following examples are the current best results

1. Train with VQA

First we use VQA dataset to train a baseline model. Use the following command:

  python3 -u train_tasks.py --model_version 3 --bert_model=bert-base-uncased --from_pretrained_conceptBert None \
      --from_pretrained=/nas-data/vilbert/data2/kilbert_base_model/pytorch_model_9.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/train1_vqa_trained_model/ \
      --summary_writer /nas-data/tensorboards/ \
      --num_workers 16 \
      --tasks 0

Command description

Parameter	Description
u	-u is used to force stdin, stdout and stderr to be totally unbuffered, which otherwise is line buffered on the terminal
model_version	Which version of the model you want to use
bert_model	Bert pre-trained model selected in the list: bert-base-uncased, bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.
from_pretrained_conceptBert	folder of the previous trained model. In this case, it's the first train, so the value is`None`
from_pretrained	pre-trained Bert model (VQA)
config_file	3 config files are available in `conceptBert/config/`
output_dir	folder where the results are saved
summary_writer	folder used to save tensorboard items. A sub-folder will be created with the date of the day
num_worker	Tells the data loader instance how many sub-processes to use for data loading. **Use your own value in
regard of your environment**
task	task = 0, we use VQA dataset

2. Train with OK-VQA (fine-tuning)

Then we use OK-VQA dataset and the trained model from step 1 to train a model. Use the following command:

  python3 -u train_tasks.py --model_version 3 --bert_model=bert-base-uncased \
      --from_pretrained=/nas-data/vilbert/data2/save_final/VQA_bert_base_6layer_6conect-beta_vilbert_vqa/pytorch_model_11.bin \
      --from_pretrained_conceptBert /nas-data/outputs/train1_vqa_trained_model/VQA_bert_base_6layer_6conect/pytorch_model_19.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/train2_okvqa_trained_model/ \
      --summary_writer /outputs/tensorboards/  \
      --num_workers 16 \
      --tasks 42

Command description

The parameters are the same as above, but these values change:

Parameter	Description
from_pretrained_conceptBert	The path of the model trained previously (step1 VQA). Corresponding of the last `pytorch_model_**.bin` file generated
from_pretrained	pre-trained Bert model (OK-VQA)
task	task = 42 OKVQA dataset is used

3. Validation with OK-VQA

To validate on held out validation split, we use the model trained in step 2 using following command: VQA_bert_base_6layer_6conect

  python3 -u eval_tasks.py --model_version 3 --bert_model=bert-base-uncased \
      --from_pretrained=/nas-data/vilbert/data2/save_final/VQA_bert_base_6layer_6conect-beta_vilbert_vqa/pytorch_model_11.bin  \
      --from_pretrained_conceptBert=/nas-data/outputs/train2_okvqa_trained_model/OK-VQA_bert_base_6layer_6conect/pytorch_model_99.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/validation_okvqa_trained_model/ \
      --num_workers 16 \
      --tasks 42 \
      --split val

Two files will be generated:

Val_other give 8 top answers for each questions
val_result used in the evaluation

Command description

The parameters are the same as above, but theses values change:

Parameter	Description
from_pretrained_conceptBert	The path of the model trained previously (step2 OKVQA). Corresponding of the last `pytorch_model_**.bin` file generated
from_pretrained	same pre-trained Bert model (OK-VQA) as step2
task	task = 42 OKVQA is used

:rocket: Evaluation

Run the evaluation :

Start the training with:

  python3 PythonEvaluationTools/vqaEval_okvqa.py \
      --json_dir /nas-data/outputs/validation_okvqa_trained_model/ \
      --output_dir /nas-data/outputs/validation_okvqa_trained_model/

Command description

json_dir: path where is located the val_result.json
output_path: folder where the accuracy will be saved
/nas-data/outputs/validation_okvqa_trained_model/: is the final json. You must change this by the path of the json you want to evaluate.

:bug: Known issues

If python-prctl return "python-prctl" Command "python setup.py egg_info" failed with error error, use this command :

  sudo apt-get install libcap-dev python3-dev

:bulb: Compare the results

Step 1: Training with VQA

20 checkpoints must have been created (last file name must be pytorch_model_19.bin)

Step 2: Training with OK-VQA

100 checkpoints must have been created (last file name must be pytorch_model_99.bin)

Step 3: Validation with OK-VQA

The validation generates two json file. val_result.json will be used in the evaluation.
Open the logs in the output folder (nas-data-) to check the result of the eval_score:

08/12/2020 13:09:46 - INFO - utils -   Validation [OK-VQA]: loss 3.681 score 33.040

If you want to optimize your model the loss and score must be at least be the same as above.

Evaluation

Compare your results in the accuracy.json file (results must be at least as good as the following ones).

{
  "overall": 33.04,
  "perQuestionType": {
    "one": 30.82,
    "eight": 33.6,
    "other": 32.57,
    "seven": 30.61,
    "four": 36.79,
    "five": 33.66,
    "three": 31.73,
    "nine": 31.43,
    "ten": 45.58,
    "two": 30.23,
    "six": 30.07
  },
  "perAnswerType": {
    "other": 33.04
  }
}

VQA Training

Documentation here

OK-VQA Training

Documentation here

Troubleshooting

CUDA out of memory

Try the following recommendation to resolve the problem:

Change the value of num_workers in your training command (ex. --num_workers 1)
Try one of the improvements proposition bellow
Reduce parameters in vlbert_tasks.yml:
- max_seq_length
- batch_size
- eval_batch_size

Example:

  max_seq_length: 4 # DGX value : 16
  batch_size: 256 # DGX value : 1024
  eval_batch_size: 256 # DGX value : 1024

Improvements

There are several areas for improvement:

Search and replace the to.device() parameter in the code to be executed in the better position
Load a part of the dataset (create a method to load a batch of the dataset). Dataset management is in vqa_dataset.py , method _load_dataset, variables questions = questions_train + questions_val[:-3000] and answers = answers_train + answers_val[:-3000]
Train your own BERT (or find a lighter Bert)
Initialise Bert once and load it after

ThalesGroup / ConceptBERT

readme

ConceptBert

License

Pre-requisite

Recommended

Disclaimer

:electric_plug: Data

:whale2: Docker installation (recommended)

Build

Start the container

Additional parameters

Other installation

:rocket: Training and Validation

1. Train with VQA

Command description

2. Train with OK-VQA (fine-tuning)

Command description

3. Validation with OK-VQA

Command description

:rocket: Evaluation

Start the training with:

Command description

:bug: Known issues

:bulb: Compare the results

Step 1: Training with VQA

Step 2: Training with OK-VQA

Step 3: Validation with OK-VQA

Evaluation

VQA Training

OK-VQA Training

Troubleshooting

CUDA out of memory

Improvements