dkurt / openvino_efficientdet

EfficientDet with Intel OpenVINO
https://github.com/openvinotoolkit/openvino
Apache License 2.0
12 stars 5 forks source link

Wrong classifications and additional bounding boxes on EfficientDet D0 Optimized #3

Closed campos537 closed 4 years ago

campos537 commented 4 years ago

First of all i would like to thanks about this Repository which is being really helpful to be able run the EfficientDet optimized with Openvino. Following the CI file of the repo made possible the easy conversion to MO format. But when testing with custom dataset i got a really lower results comparing to the other models like the one from Yet-Another-EfficientDet-Pytorch even using the same confidence threshold as 0.2. I thought i was using the wrong label map but realized that is the coco minus 1: labels = {0:"person",2:"car",3:"motorcycle",5:"bus",7:"truck"} Also the detections are really great just with a few additional bounding boxes, but the classification shows really wrong results , see the images below generated with EfficientDet D0:

Yet-Another-EfficientDet-Pytorch img_inferred_d0_this_repo_0

automl EfficientDet 0

And this is the detection following the label set above with the MO model: imgout

dkurt commented 4 years ago

Hi!

Really glad to see that this repository is useful for you!

Yeah, there is an issue right now that you should compare AutoML class id with OpenVINO class id + 1

https://github.com/dkurt/openvino_efficientdet/blob/50233ece06d825bea04037a38222a5afcd8bfabf/scripts/validate.py#L146-L147

campos537 commented 4 years ago

Hey!

Thanks for the answer, i saw that running the script run_opencv.py the person label has the 0 value, but what i found weird is that even with the different labelmap fixed, the classes continues wrong, like you can see in the image we have a lot of cars being detected as buses and trucks, i don't know if i'm doing anything wrong or its an issue.

campos537 commented 4 years ago

Just to give a closure to this issue, i tested with the last version that you added on the repo and the results are according to the TF frozen, the i realized was this ones:

  1. The frozen graph doesn't detect bounding boxes with less confidence than 0.4 while the OpenVino still detects.
  2. Comparing to the frozen graph the original TF model has better classification and detections with confidences higher than 0.2

So the problem itself seens to be with the frozen graph not with this conversion to MO

dkurt commented 4 years ago

Yes, TensorFlow graph performs Non-Maximum suppression inside the network inference. So it has hardcoded NMS threshold and confidence threshold. With OpenVINO model you can do the same by editing .xml file (DetectionOutput layer parameters)

dkurt commented 4 years ago

@campos537, can you please provide an origin image to validate?

campos537 commented 4 years ago

Sure, this is the image i'm testing. Also i'm using the d0 for that. img

dkurt commented 4 years ago

Well, the problem is that image has different aspect ratio (1080x1920 while network works with 512x512). Internally, TensorFlow performs normalization -> resize (512x288) -> zero pading (512x512) But OpenVINO process already resized input.

So the best choice is to train model on the same resolution which is used in deployment.

Or remove preprocessing completely from network and do it before inference.

campos537 commented 4 years ago

Got it, thanks for all the help, only one more question. Comparing the Tensorflow after frozen i get the same issues as the OpenVINO version, this means that the frozen is doing the same as the OpenVINO pre-processing right?

dkurt commented 4 years ago

@campos537, in example, if you have an image of size 512x512 both OpenVINO and frozen TensorFlow graph will produce the same output (because of disabled resize+padding). But if image has not equal width and height - OpenVINO and frozen TensorFlow graph will produce different results (because of preprocessing).

Your issue with a difference between frozen graph and AutoML runtime probably related to train/test modes. Network in train mode works differently (i.e. Dropout, BatchNorm layers). Frozen graph turns all the layers to testing mode.

campos537 commented 4 years ago

Thanks man, i will try to get the same result here changing the pre-processing. I will do the same preprocessing of the Tensorflow and set as input of the OpenVINO model to test, i should get the same result as the normal Tensorflow right?

campos537 commented 4 years ago

Hey @dkurt just a last comment, i tested doing the pad manually and worked way better! Thanks for the help.

dkurt commented 3 years ago

Hi, @campos537! I were able to reproduce TensorFlow preprocessing so it can be used to get exact predictions! Please take a look at https://github.com/dkurt/openvino_efficientdet/pull/10.

TensorFlow predictions
0.94601697 21 163 505 711
0.74208856 634 406 713 454
0.69452524 458 335 505 365
0.67427874 640 336 694 367
0.6548096 391 314 422 344
0.5890408 746 382 816 427
0.56840366 549 358 601 399

OpenVINO predictions
0.9460169 21 163 505 711
0.7420901 635 406 713 454
0.69452417 459 335 505 365
0.6742812 640 336 694 367
0.65481055 391 314 422 345
0.5890416 746 383 816 427
0.56840587 549 359 601 400

There is +/- 1 pixel difference in coordinates due to rounded aspect ratios in automl_efficientdet.json file. If you need perfect match - replace it to

"aspect_ratios": [1.0, 1.0, 1.41421356, 0.70710678, 0.70710678, 1.41421356],

source: for a in (1.0, 2.0, 0.5) append (sqrt(a), 1 / sqrt(a))

campos537 commented 3 years ago

Hey @dkurt, this are really great news! In my tests i was also able to reproduce almost the same result by creating an image with 512x512 but maintaining the same aspect ratio than the original, filling with black pixels the other part. Thats really good to know!