Open thusinh1969 opened 5 years ago
Same problem. There are 10000 boxes generated and do_nms does for-loops over them instead of doing vectorized operations. I found this tutorial and will try to integrate their implementation into do_nms.
decode_netout doesn't filter boxes by threshold Try replacing if(objectness.all() <= obj_thresh): continue with if (objectness <= obj_thresh).all(): continue (line 302)
@AlexM4 That is a MASSIVE improvement!
@alpotapov Have you found a better / faster version of do_nms() ?
@alpotapov Have you found a better / faster version of do_nms() ?
I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.
@alpotapov Have you found a better / faster version of do_nms() ?
I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.
Can you share the code how you used tf.image.non_max_suppression
@alpotapov Have you found a better / faster version of do_nms() ?
I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.
Can you share the code how you used tf.image.non_max_suppression
I believe this is it. I've since swapped to using the yolov3-tf2 codebase which implements most of the items in Tensorflow. The raw Python I was using before wasn't quite as speedy.
I tried to make a custom do_nms function and gained a bit of time improvement, maybe this will be helpful for someone. I am very new to the whole ML scene, and programming in general so please feel free to correct me if there are any issues, or any improvements I can make.
def do_nms(boxes, scores, threshold):
selected_indices = tf.image.non_max_suppression(
boxes, scores, 10, threshold)
selected_boxes = tf.gather(boxes, selected_indices)
return selected_boxes.numpy().astype(int)
while True:
...
...
...
...
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
coords = np.empty([len(v_boxes), 4])
for i in range(len(v_boxes)):
coords[i] = [v_boxes[i].ymin, v_boxes[i].xmin,
v_boxes[i].ymax, v_boxes[i].xmax]
s_boxes = do_nms(coords, v_scores, 0.5)
num_preds = print(len(s_boxes))
# summarize what we found
for i in range(len(s_boxes)):
print(v_labels[i], v_scores[i])
I use the yolo3_one_file_to_detect_them_all.py for a 608x608 image. It was so slow that I have to time the entire your prediction process. It turned out that that do_nms takes 5-7 seconds for an image that has 10 objects (person only).
I am using Titan X on Ubuntu 16.4. All other models prediction are 30-35 fps. Any hint please.
Thank you. Steve