In this repository, we provide the code for ensembling the output of object detection models, and applying test-time augmentation for object detection. This library has been designed to be applicable to any object detection model independently of the underlying algorithm and the framework employed to implement it. A draft describing the techniques implemented in this repository are available in the following article.
This library requires Python 3.6 and the packages listed in requirements.txt
.
Installation:
git clone https://github.com/ancasag/ensembleObjectDetection
pip3 install -r requirements.txt
In the following image, we show an example of the workflow of our ensemble algorithm. Three methods have been applied to detect the objects in the original image: the first method has detected the person and the horse; the second, the person and the dog; and, the third, the person, the dog, and an undefined region. The first step of our ensemble method groups the overlapping regions. Subsequently, a voting strategy is applied to discard some of those groups. The final predictions are obtained using the NMs algorithm.
Three different voting strategies can be applied with our ensemble algorithm:
In order to run the ensemble algorithm, you can edit the file mainModel.py from the TestTimeAugmentation folder to configure the models to use and then invoke the following command where pathOfDataset
is the path where the images are saved, and option
is the voting strategy (affirmative, consensus or unanimous).
python mainModel.py -d pathOfDataset -o option
A simpler way to use our this method is provided in the following notebook.
In the following image, we show an example of the workflow of test-time augmentation (from now on, TTA) for object detectors. First, we apply three transformations to the original image: a histogram equalisation, a horizontal flip, and a none transformation (that does not modify the image). Subsequently, we detect the objects in the new images, and apply the corresponding detection transformation to locate the objects in the correct position for the original image. Finally, the detections are ensembled using the consensus strategy.
As indicated previously, three different voting strategies can be applied for TTA:
These are all the techniques that we have defined to use in the TTA process. The first column corresponds with the name assigned to the technique, and the second column describes the technique.
In order to run the ensemble algorithm, you can edit the mainTTA.py file from the TestTimeAugmentation folder to configure the model to use and the transformation techniques. Then, you can invoke the following command where pathOfDataset
is the path where the images are saved, and option
is the voting strategy (affirmative, consensus or unanimous).
python mainTTA.py -d pathOfDataset -o option
A simpler way to use our this method is provided in the following notebook.
This open source library can be extended to work with any object detection model regardless of the algorithm and framework used to build it. To do this, it is necessary to create a new class that extends the IPredictor
class of the following diagram:
Several examples of classes extending the IPredictor
class can be seen in the testTimeAugmentation.py file. Namely, it is necessary to define a class with a predict
method that takes as input the path to a folder containing the images, and stores the predictions in the Pascal VOC format in the same folder. Once this new class has been created, it can be applied both for the ensemble of models and for TTA.
Currently, they library can work with models constructed with the following models:
DarknetYoloPred
class. The constructor of this class takes as input the path to the weights of the model, the path to the file with the names of the classes for the model, and the configuration file. MXnetFasterRCNNPred
class. The constructor of this class takes as input the path to the weights of the model and the path to the file with the names of the classes for the model.MXnetSSD512Pred
class. The constructor of this class takes as input the path to the weights of the model and the path to the file with the names of the classes for the model.MXnetYoloPred
class. The constructor of this class takes as input the path to the weights of the model and the path to the file with the names of the classes for the model.RetinaNetResnet50Pred
class. The constructor of this class takes as input the path to the weights of the model and the path to the file with the names of the classes for the model.MaskRCNNPred
class. The constructor of this class takes as input the path to the weights of the model and the path to the file with the names of the classes for the model.You can see several examples of these models in the notebook for ensembling models.
Several experiments were conducted to test this library and the results are presented in the article. Here, we provide the datasets and models used for those experiments.
For the experiments of Section 4.1 of the paper, we employed the test set of The PASCAL Visual Object Classes Challenge; and, the pre-trained models provided by the MXNet library.
For the experiments of Section 4.2 of the paper, we employed two stomata datasets:
Using these datasets, we trained YOLO models using the Darknet framework:
For the experiments of Section 4.3 of the paper, we employed two table datasets:
Using the ICDAR 2013 dataset, we have trained several models for this dataset:
We have also trained several models for the ICDAR 2013 dataset using model distillation using the images of the Word part of the TableBank dataset:
Use this bibtex to cite this work:
@misc{CasadoGarcia19,
title={Ensemble Methods for Object Detection},
author={A. Casado-García and J. Heras},
year={2019},
note={\url{https://github.com/ancasag/ensembleObjectDetection}},
}
This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain. We also thank Álvaro San-Sáez for providing us with the stomata datasets.