artiso-solutions / CoVoX

MIT License
1 stars 1 forks source link

Investigation of multiple object detection services #42

Closed vgibilmanno closed 3 years ago

vgibilmanno commented 3 years ago

Azure Computer Vision

Language: C# Training required?: No Input: Image as stream to API Endpoint Returns: JSON containing info about detected objects (with descriptions), bounding boxes and overall image description Good for live video feed?: No since doing a frame per frame detection would consume too many API endpoint calls.

Works only for really basic and day to day used objects like bottle, cups etc. If I want to detect a cactus it doesn't work.

image

Azure Custom Vision

Language: C# Training required?: Yes You have to upload images and label all objects of every image. This can be done programmatically by providing a JSON with bounding box regions and a label or with an online image tagger provided by MSFT. Training a cactus worked well and the detection was kinda ok. The training dataset wasn't distinct enough to provide satisfactory results though.

Input: Image as stream to API Endpoint Returns: JSON containing info about detected objects (with descriptions), bounding boxes and overall image description Good for live video feed?: No since doing a frame per frame detection would consume too many API endpoint calls.

image

Offline Model based on YOLOv3 CPU (ImageAI)

https://github.com/OlafenwaMoses/ImageAI

Language: Python Training required?: Optional YOLOv3 has only 80 classes which it was trained with which represent most day to day objects (bottle, cups, cars etc). For more specific objects like a cactus you have to individually re-train the model.

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Yes though very slow. With fastest approach I was able to get 2FPS

image

Offline Model based on MobileNet SSD CPU

Language: Python Training required?: Optional

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Very fast. Probably best model for live video prediction

The results though are very unreliable and work only for really basic things but training this model on a custom dataset could prove to be good.

Video: https://streamable.com/3qh0mf

image

Offline Model based on YOLOv5 CPU (PyTorch)

https://github.com/OlafenwaMoses/ImageAI

Language: Python Training required?: Optional YOLOv5 like YOLOv3 has only 80 classes which it was trained with which represent most day to day objects (bottle, cups, cars etc). For more specific objects like a cactus you have to individually re-train the model.

Input: Image as stream to library Returns: Object containing info about detected objects (with descriptions) and bounding boxes Good for live video feed?: Yes much faster than YOLOv3. I get multiple FPS. So it's slower than MobilNet SSD but more precise

image

How to label images for offline models

To label images for offline models there is Roboflow which does a really good job. https://app.roboflow.com/

After labeling all images you can export them in the format of your desired model.

image image