google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.05k stars 5.11k forks source link

What is "object_detection_ssd_mobilenetv2_oidv4_fp16.tflite" file for ? #1526

Closed dnap512 closed 3 years ago

dnap512 commented 3 years ago

While touring the TFLite models, I found the bject_detection_ssd_mobilenetv2_oidv4_fp16.tflite file. https://github.com/google/mediapipe/blob/master/mediapipe/models/object_detection_ssd_mobilenetv2_oidv4_fp16.tflite It's not included in the description of meidiapipe models, could you tell me where to use this file?

ahmadyan commented 3 years ago

The Objectron model (3D Object Detection) is a two stage detector. The first stage detects a 2D object crop using the model you linked above, while the second stage estimates the 3D bounding box from the given crop.

3D Object Detection

You can refer to the Tensorflow object detection tutorial for more information/examples.

dnap512 commented 3 years ago

Thank you for reply.

nargeshn commented 2 years ago

I am trying to use object_detection_ssd_mobilenetv2_oidv4_fp16.tflite to get 2D object crops and use them as input to the second network but the results are not meaningful. Is the model trained? If yes what dataset is used for the training?

ahmadyan commented 2 years ago

The SSD object detector is trained on JFT.

nargeshn commented 2 years ago

Thanks for your response! I have two more questions and I would appreciate your response! I use Tensorflow.image.combined_non_max_suppression on detected bounding boxes to get the normalized coordinates and the label of the detected class. Now my questions are:

  1. All detected bounding boxes are classified as class one (label 0). Does it mean that this network is mainly used to get region proposals and does not return the class labels of detected boxes?
  2. I tested the network on the simple images of one or two cups but most of the results after non-max suppression have either the width or height of zero or the bounding boxes are much larger or much smaller than the 2d ground truth bounding box. In other words, the detected bounding boxes don't match the ground truth bounding boxes. Is this behavior expected from the model?
ahmadyan commented 2 years ago

I recommend creating a new issue in mediapipe, as this issue is already closed.

The behavior you are describing does not sound normal. You can see the expected behavior of that graph in https://google.github.io/mediapipe/solutions/box_tracking and specifically in this graph for object detection with ssd: https://github.com/google/mediapipe/blob/master/mediapipe/graphs/tracking/subgraphs/object_detection_gpu.pbtxt.

nargeshn commented 2 years ago

Thanks! Sure, I'll create a new issue.