GinJinn2 provides a collection of command-line tools for bounding-box object detection and instance segmentation based on Detectron2. Besides providing a convenient interface to the latter, GinJinn2 offers several utility functions to facilitate building custom pipelines.
Our comprehensive documentation including various usage examples can be found at https://ginjinn2.readthedocs.io/en/latest/.
It is recommended to install GinJinn2 via Mamba, a more efficient reimplementation of the Conda package management system.
To install Mamba (Miniforge distribution), run the following commands in your Linux terminal:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
mamba init
Before installing GinJinn2, we recommend creating a new Conda/Mamba environment to avoid possible version conflicts with existing software. Here, we use Python 3.8; other versions may also work. The environment to be created is named "gj".
mamba create -n gj python=3.8
To activate this environment, run:
mamba activate gj
Inside the activated environment, run the following command to install GinJinn2 (insert your CUDA version, 10.1 should work for most modern GPUs):
mamba install -c agoberprieler -c pytorch cudatoolkit=10.1 ginjinn2
Finally, test your installation:
ginjinn -h
Make sure to activate your Conda environment via conda activate MY_ENV_NAME
prior to running any ginjinn command.
ginjinn
and all of its subcommands provide help pages, which can be displayed using the argument -h
or --help
, e.g.
ginjinn -h
(get list of all essential GinJinn commands)ginjinn utils -h
(get list of GinJinn's additional utilities)ginjinn utils crop -h
(get usage information for the cropping utility)The help pages briefly describe basic functionality and command-specific arguments. For further explanation, see Getting Started and Toolbox.
A (labeled) input dataset should consist of a single image directory containing JPG images at its top level and accompanying annotations. GinJinn2 supports two common annotation formats, COCO's data format (one JSON file per dataset), which is also used as output format, and XML files as used by PASCAL VOC (one file per image). The latter, however, is only supported for bounding-box object detection.
Although not mandatory, it is recommended to place image directory and annotations in a common directory to enable more compact command invocations. If datasets are structured as shown below, the user does not have to specify the image directory explicitly. Note that the file names are arbitrarily chosen.
COCO
data
├── annotations.json
└── images
├── 1.jpg
├── 2.jpg
└── ...
Pascal VOC
data
├── annotations
│ ├── 1.xml
│ ├── 2.xml
│ └── ...
└── images
├── 1.jpg
├── 2.jpg
└── ...
In case of nested image directories, ginjinn utils flatten
helps to convert datasets to an accepted format.
In addition to the dataset for training the model, it is advisable to provide a validation dataset, which can be used to optimize (hyper)parameters and to detect overfitting. A further test dataset, if available, allows to obtain an unbiased evaluation of the final, trained model.
ginjinn split
can be used to partition a single dataset such that each image along with its annotated objects is assigned to only one of two or three sub-datasets ("train", "val", "test"). Aiming at a balanced split across different object categories, a simple heuristic is used to propose dataset partitions. The generated output has the following structure:
COCO
data
├── train
│ ├── annotations.json
│ └── images
├── val
│ ├── annotations.json
│ └── images
└── test
├── annotations.json
└── images
Pascal VOC
data
├── train
│ ├── annotations
│ └── images
├── val
│ ├── annotations
│ └── images
└── test
├── annotations
└── images
ginjinn new
This command generates a new project directory, which is required for training, evaluation, and prediction. Initially, it only contains an empty output folder and the GinJinn2 configuration file (“ginjinn_config.yaml”), a simple, formatted text file for storing various settings, which can be edited by the user for customization. When executing certain GinJinn2 commands, further data may be written to the project directory. To avoid inconsistencies, it is strongly recommended to keep the configuration file fixed throughout subsequent steps of the analysis.
The -t
/--template
and -d
/--data_dir
options allow to automatically specify various settings such that a valid configuration can be created without manually editing the configuration file.
ginjinn train
This command trains the model and simultaneously evaluates it using the validation dataset, if available. Checkpoints are saved periodically at predefined intervalls. While the training is running, its progress can be most easily monitored via the "outputs/metrics.pdf" file in your project directory.
If the training has finished or in case of interruption, it can be resumed with -r
/--resume
. The number of iterations as specified in the configuration file can be overwritten with -n
/--n_iter
.
ginjinn evaluate
This command calculates COCO evaluation metrics for a trained model using the test dataset.
ginjinn predict
This command uses a trained model to predict objects for new, unlabeled images. It provides several optional types of output: a COCO annotation file, object visualization on the original images, and cropped images (segmentation masks) for each predicted object.
Concrete workflows including more complex examples are described at Getting Started and Example Applications.
GinJinn2 is released under the Apache 2.0 license.
Ott T. and Lautenschlager U. (2021). GinJinn2: Object detection and segmentation for ecology and evolution. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13787
Please also cite Detectron2 when using models from Detectron2's model zoo, and CascadePSP when using the segmentation-refinement option of ginjinn predict
.