YOLOv8 inference on Coral TPU does not return any detection results on valid image

ukicomputers commented 8 months ago

Hi @JungLearnBot! First of all thanks for great project! I am guy from YouTube that wrote comment.

Hardware specification

I am using the Coral TPU USB accelerator on normal PC (currently running Ubuntu 23.04 on Python 3.9 with all installed packages that you wrote except TensorFlow, not required in this config). I successfully ran the examples from (coral.ai/examples) website, so I can be sure that it works. The difference between PCIe and USB accelerator is not very much, so don't worry about that.

Usage

I wrote just quick script to use the lib:

import cv2
from ultralytics.utils.plotting import Annotator
from yolo_manager import YoloDetectorWrapper

def draw_annotation(img, label_names, results):
    annotator = None
    for r in results:
        annotator = Annotator(img)

        boxes = r.boxes
        for box in boxes:
            b = box.xyxy[0]  # get box coordinates in (top, left, bottom, right) format
            c = box.cls
            annotator.box_label(b, label_names[int(c)])
            print("the box")
            print(box)

    if annotator is not None:
        annotated_img = annotator.result()
    else:
        annotated_img = img.copy()

    return annotated_img

image = cv2.imread("bus.jpg")
image = cv2.resize(image, (192, 192))
detector = YoloDetectorWrapper("192.tflite", True)
results = detector.predict(image)
print(results)

draw_annotation(image, detector.get_label_names(), results)

cv2.imshow("izlaz", image)
cv2.waitKey(0)

It is probably valid. But when I run inference, I get following result (the results object), but the result.boxes does not contain anything, and that means it has got nothing detected. Here's it's out:

[ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[125, 152, 176],
        [129, 163, 187],
        [126, 166, 191],
        ...,
        [144, 159, 175],
        [161, 174, 190],
        [158, 171, 185]],

       [[123, 150, 175],
        [128, 161, 185],
        [127, 164, 190],
        ...,
        [142, 157, 174],
        [162, 175, 189],
        [157, 170, 184]],

       [[123, 153, 176],
        [124, 157, 181],
        [128, 165, 191],
        ...,
        [146, 161, 177],
        [163, 177, 189],
        [158, 171, 185]],

       ...,

       [[109, 102, 105],
        [ 96,  90,  92],
        [ 96,  93,  98],
        ...,
        [122, 114, 117],
        [117, 108, 110],
        [104,  96,  97]],

       [[113, 114, 119],
        [116, 116, 120],
        [ 89,  82,  89],
        ...,
        [127, 120, 130],
        [106,  99, 104],
        [107,  98, 102]],

       [[171, 171, 177],
        [158, 159, 163],
        [164, 163, 167],
        ...,
        [121, 116, 121],
        [127, 121, 126],
        [113, 105, 110]]], dtype=uint8)
orig_shape: (192, 192)
path: ''
probs: None
save_dir: None
speed: {'preprocess': None, 'inference': None, 'postprocess': None}]

The image source is from here, it is official image for try detection with YOLOv8.

The MAY issue

I am not too deep in this topic (YOLOv8), but it may be because some filter is not applied. If I am right, help will be appreciated.

Thanks in advance.

ukicomputers commented 8 months ago

@JungLearnBot if you can help me solve this issue, please reply me what this code returns when you run it with using your model and this image.

from PIL import Image
from pycoral.adapters import common
from pycoral.adapters import detect
from pycoral.utils.edgetpu import make_interpreter

interpreter = make_interpreter('yolov8n.tflite') # replace this with your model from models dir
interpreter.allocate_tensors()

image = Image.open('bus.jpg') # replace this with image that I gave you
_, scale = common.set_resized_input(interpreter, image.size, lambda size: image.resize(size, Image.LANCZOS))

interpreter.invoke()
print("")
print(interpreter.get_output_details())
print("")
print(interpreter._get_full_signature_list())
print("")

detections = detect.get_objects(interpreter, 0.4, scale) # don't worry if error happens
print(detections)

This will be so helpful if you reply to me :). Thanks in advance!

ukicomputers commented 8 months ago

It is not anything to filters, or with using cv2 imread instead of picam capture_array (both function calls return NumPy array), so I can confirm. Also worth looking jveitchmichaelis/edgetpu-yolo, his implementation works for me fully. I will be updated.

ukicomputers commented 8 months ago

Weird: when using YOLOv5 pre-trained model I get proper results using your script

JungLearnBot commented 8 months ago

@ukicomputers, I tried your code and I got the same result. not much detection out of it. but that does not mean there is something wrong with the code. You have to keep in mind that detection quality of quantized 192x192 is very low unless you tuned/trained your model to detect very specific target (which is another big challenge).

When I was experimenting with camera input and I was in the frame all the time but quite often it failed to detect me as person even thou I was fairly close to the camera. 192x192 resolution is very low for detection anything in most of the case in my experience.

another minor thing about your first code is that "draw_annotation" function returns "new image" so it should

result_annotation_img = draw_annotation(image, detector.get_label_names(), results) cv2.imshow("izlaz", result_annotation_img ) cv2.waitKey(0)

instead of:

draw_annotation(image, detector.get_label_names(), results) cv2.imshow("izlaz", image) cv2.waitKey(0)

Hope this feedback helps you with your project.

ukicomputers commented 8 months ago

Yes, I did not write well code, sorry, however, it did not get any inference results. It is maybe because of low resolution, but I wonder if that is true.
I will try today after competition again with very small image of me very close (?) but I was now wondering why YOLOv5 was returning sucessful detections with your code, with same resolution. If you can to also to try (thanks firstly for reply and your time spent) to detect and get detection results, I will write code after reply.

ukicomputers commented 8 months ago

I haven't still tried. Also, there is not much need for you to test, because your code is probably right. Yes, you have point for me, thanks, the 192x192 image res is very low, and when I was thinking how image will be after resizing that phone format image to 192x192 res, the resulting proportion and look is very wrong, even the person needs time to think what it is, and what to talk about for the computer. Thanks again! I will be updated if needed.

ukicomputers commented 8 months ago

Still nothing. I tried with following image cut exactly to 192x192: dog And I got no result. Do you maybe have on mind something different? Thanks.

By the way, if this helps somehow, this is log of the important things:

"""self.interpreter.get_output_details():"""  [{'name': 'PartitionedCall:0', 'index': 1, 'shape': array([  1,  84, 756], dtype=int32), 'shape_signature': array([  1,  84, 756], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.005066452547907829, -128), 'quantization_parameters': {'scales': array([  0.0050665], dtype=float32), 'zero_points': array([-128], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}] 

"""result during enumerate(preds):""" pred:  tensor([], size=(0, 6))
pred[:, :4]:  tensor([], size=(0, 4)) 

"""results during enumerate(preds):"""  [ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: # I deleted here from this conversation because of size, but it's yolo_default_label_names
orig_img: array([[[156, 150, 169],
        [171, 167, 186],
        [173, 173, 191],
        ...,
        [132, 132, 150],
        [130, 130, 148],
        [119, 119, 137]],

       [[172, 166, 185],
        [170, 166, 185],
        [167, 167, 185],
        ...,
        [145, 144, 160],
        [143, 142, 158],
        [134, 133, 149]],

       [[172, 169, 185],
        [157, 156, 172],
        [157, 156, 172],
        ...,
        [134, 134, 148],
        [143, 143, 157],
        [133, 133, 147]],

       ...,

       [[ 62, 126, 130],
        [ 61, 125, 129],
        [ 60, 123, 127],
        ...,
        [ 84,  81, 103],
        [ 90,  87, 109],
        [104, 100, 125]],

       [[ 54, 123, 126],
        [ 53, 122, 125],
        [ 55, 122, 125],
        ...,
        [ 47,  46,  80],
        [ 60,  54,  95],
        [ 72,  66, 107]],

       [[ 51, 122, 125],
        [ 52, 123, 126],
        [ 55, 124, 127],
        ...,
        [ 19,  17,  59],
        [ 17,  11,  60],
        [ 13,   6,  57]]], dtype=uint8)
orig_shape: (192, 192)
path: ''
probs: None
save_dir: None
speed: {'preprocess': None, 'inference': None, 'postprocess': None}]

Don't worry, this is just printed from terminal, not code, more like debug message.

JungLearnBot / RPi5_yolov8