ancasag / ensembleObjectDetection

MIT License
156 stars 30 forks source link

Ensemble methods for object detection

In this repository, we provide the code for ensembling the output of object detection models, and applying test-time augmentation for object detection. This library has been designed to be applicable to any object detection model independently of the underlying algorithm and the framework employed to implement it. A draft describing the techniques implemented in this repository are available in the following article.

  1. Installation
  2. Ensemble of models
  3. Test-time augmentation for object detection
  4. Adding new models
  5. Experiments
  6. Citation
  7. Acknowledgements

Installation and Requirements

This library requires Python 3.6 and the packages listed in requirements.txt.

Installation:

  1. Clone this repository
git clone https://github.com/ancasag/ensembleObjectDetection
  1. Install dependencies
pip3 install -r requirements.txt

Ensemble of models

In the following image, we show an example of the workflow of our ensemble algorithm. Three methods have been applied to detect the objects in the original image: the first method has detected the person and the horse; the second, the person and the dog; and, the third, the person, the dog, and an undefined region. The first step of our ensemble method groups the overlapping regions. Subsequently, a voting strategy is applied to discard some of those groups. The final predictions are obtained using the NMs algorithm.

TestTimeAugmentation

Ensemble Options

Three different voting strategies can be applied with our ensemble algorithm:

Execution

In order to run the ensemble algorithm, you can edit the file mainModel.py from the TestTimeAugmentation folder to configure the models to use and then invoke the following command where pathOfDataset is the path where the images are saved, and option is the voting strategy (affirmative, consensus or unanimous).

python mainModel.py -d pathOfDataset -o option

A simpler way to use our this method is provided in the following notebook.

Test-time augmentation for object detection

In the following image, we show an example of the workflow of test-time augmentation (from now on, TTA) for object detectors. First, we apply three transformations to the original image: a histogram equalisation, a horizontal flip, and a none transformation (that does not modify the image). Subsequently, we detect the objects in the new images, and apply the corresponding detection transformation to locate the objects in the correct position for the original image. Finally, the detections are ensembled using the consensus strategy.

TestTimeAugmentation

Ensemble Options

As indicated previously, three different voting strategies can be applied for TTA:

Techniques for TTA

These are all the techniques that we have defined to use in the TTA process. The first column corresponds with the name assigned to the technique, and the second column describes the technique.

Execution

In order to run the ensemble algorithm, you can edit the mainTTA.py file from the TestTimeAugmentation folder to configure the model to use and the transformation techniques. Then, you can invoke the following command where pathOfDataset is the path where the images are saved, and option is the voting strategy (affirmative, consensus or unanimous).

python mainTTA.py -d pathOfDataset -o option

A simpler way to use our this method is provided in the following notebook.

Adding new models

This open source library can be extended to work with any object detection model regardless of the algorithm and framework used to build it. To do this, it is necessary to create a new class that extends the IPredictor class of the following diagram:

DiagramModels

Several examples of classes extending the IPredictor class can be seen in the testTimeAugmentation.py file. Namely, it is necessary to define a class with a predict method that takes as input the path to a folder containing the images, and stores the predictions in the Pascal VOC format in the same folder. Once this new class has been created, it can be applied both for the ensemble of models and for TTA.

Available models

Currently, they library can work with models constructed with the following models:

You can see several examples of these models in the notebook for ensembling models.

Experiments

Several experiments were conducted to test this library and the results are presented in the article. Here, we provide the datasets and models used for those experiments.

Pascal VOC

For the experiments of Section 4.1 of the paper, we employed the test set of The PASCAL Visual Object Classes Challenge; and, the pre-trained models provided by the MXNet library.

Stomata

For the experiments of Section 4.2 of the paper, we employed two stomata datasets:

Using these datasets, we trained YOLO models using the Darknet framework:

Tables

For the experiments of Section 4.3 of the paper, we employed two table datasets:

Using the ICDAR 2013 dataset, we have trained several models for this dataset:

We have also trained several models for the ICDAR 2013 dataset using model distillation using the images of the Word part of the TableBank dataset:

Citation

Use this bibtex to cite this work:

@misc{CasadoGarcia19,
  title={Ensemble Methods for Object Detection},
  author={A. Casado-García and J. Heras},
  year={2019},
  note={\url{https://github.com/ancasag/ensembleObjectDetection}},
}

Acknowledgments

This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain. We also thank Álvaro San-Sáez for providing us with the stomata datasets.