Azure Computer Vision

Language: C# Training required?: No Input: Image as stream to API Endpoint Returns: JSON containing info about detected objects (with descriptions), bounding boxes and overall image description Good for live video feed?: No since doing a frame per frame detection would consume too many API endpoint calls.

Works only for really basic and day to day used objects like bottle, cups etc. If I want to detect a cactus it doesn't work.

Azure Custom Vision

Language: C# Training required?: Yes You have to upload images and label all objects of every image. This can be done programmatically by providing a JSON with bounding box regions and a label or with an online image tagger provided by MSFT. Training a cactus worked well and the detection was kinda ok. The training dataset wasn't distinct enough to provide satisfactory results though.

Input: Image as stream to API Endpoint Returns: JSON containing info about detected objects (with descriptions), bounding boxes and overall image description Good for live video feed?: No since doing a frame per frame detection would consume too many API endpoint calls.

Offline Model based on YOLOv3 CPU (ImageAI)

https://github.com/OlafenwaMoses/ImageAI

Language: Python Training required?: Optional YOLOv3 has only 80 classes which it was trained with which represent most day to day objects (bottle, cups, cars etc). For more specific objects like a cactus you have to individually re-train the model.

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Yes though very slow. With fastest approach I was able to get 2FPS

Offline Model based on MobileNet SSD CPU

Language: Python Training required?: Optional

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Very fast. Probably best model for live video prediction

The results though are very unreliable and work only for really basic things but training this model on a custom dataset could prove to be good.

Video: https://streamable.com/3qh0mf

Offline Model based on YOLOv5 CPU (PyTorch)

https://github.com/OlafenwaMoses/ImageAI

Language: Python Training required?: Optional YOLOv5 like YOLOv3 has only 80 classes which it was trained with which represent most day to day objects (bottle, cups, cars etc). For more specific objects like a cactus you have to individually re-train the model.

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Yes much faster than YOLOv3. I get multiple FPS. So it's slower than MobilNet SSD but more precise

How to label images for offline models

To label images for offline models there is Roboflow which does a really good job. https://app.roboflow.com/

After labeling all images you can export them in the format of your desired model.

artiso-solutions / CoVoX

Investigation of multiple object detection services #42