experiencor / keras-yolo3

Training and Detecting Objects with YOLO3
MIT License
1.6k stars 861 forks source link

Non-Max-Suppression is extremely SLOW up to 5-7 seconds ! #177

Open thusinh1969 opened 5 years ago

thusinh1969 commented 5 years ago

I use the yolo3_one_file_to_detect_them_all.py for a 608x608 image. It was so slow that I have to time the entire your prediction process. It turned out that that do_nms takes 5-7 seconds for an image that has 10 objects (person only).

I am using Titan X on Ubuntu 16.4. All other models prediction are 30-35 fps. Any hint please.

Thank you. Steve

alpotapov commented 5 years ago

Same problem. There are 10000 boxes generated and do_nms does for-loops over them instead of doing vectorized operations. I found this tutorial and will try to integrate their implementation into do_nms.

AlexM4 commented 5 years ago

decode_netout doesn't filter boxes by threshold Try replacing if(objectness.all() <= obj_thresh): continue with if (objectness <= obj_thresh).all(): continue (line 302)

FlorinAndrei commented 5 years ago

@AlexM4 That is a MASSIVE improvement!

FlorinAndrei commented 5 years ago

@alpotapov Have you found a better / faster version of do_nms() ?

firefly2442 commented 4 years ago

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

AiTeamVusolutions commented 4 years ago

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

Can you share the code how you used tf.image.non_max_suppression

firefly2442 commented 4 years ago

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

Can you share the code how you used tf.image.non_max_suppression

I believe this is it. I've since swapped to using the yolov3-tf2 codebase which implements most of the items in Tensorflow. The raw Python I was using before wasn't quite as speedy.

zeeshanbasar commented 3 years ago

I tried to make a custom do_nms function and gained a bit of time improvement, maybe this will be helpful for someone. I am very new to the whole ML scene, and programming in general so please feel free to correct me if there are any issues, or any improvements I can make.

def do_nms(boxes, scores, threshold):
    selected_indices = tf.image.non_max_suppression(
        boxes, scores, 10, threshold)
    selected_boxes = tf.gather(boxes, selected_indices)

    return selected_boxes.numpy().astype(int)
while True:
...
...
...
...

    # get the details of the detected objects
    v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)

    coords = np.empty([len(v_boxes), 4])
    for i in range(len(v_boxes)):
        coords[i] = [v_boxes[i].ymin, v_boxes[i].xmin,
                     v_boxes[i].ymax, v_boxes[i].xmax]

    s_boxes = do_nms(coords, v_scores, 0.5)

    num_preds = print(len(s_boxes))

    # summarize what we found
    for i in range(len(s_boxes)):
        print(v_labels[i], v_scores[i])