AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.71k stars 7.96k forks source link

Bounding Boxes not centred after converting Open Images Bounding Boxes #7937

Open Bacus96 opened 3 years ago

Bacus96 commented 3 years ago

I'm having an issue with training Yolov4. I convert XMin, XMax, YMin and YMax into the Yolo format with the following code:

X = XMAX
Y = YMAX
x_diff = int(XMAX/2)
y_diff = int(YMAX/2)
Width = XMin+x_diff
Height = YMin+y_diff
Width /= int(image.shape[1])
Height /= int(image.shape[0])
X /= int(image.shape[1])
Y /= int(image.shape[0])

Here's an example of the labels pre- conversion:

XMin | XMax. | YMin | YMax 0.401471 | 0.908824 | 0.000000 | 0.913526

Post conversion they look correct, however the boxes are always slightly off the image when I save the annotations as I train, so the models don't train correctly.

Thanks, Jason

stephanecharette commented 3 years ago

I don't understand the code you posted. And the example numbers you posted don't make sense.

But note that the annotation format is not X, Y, W, and H. This is explained here: https://www.ccoderun.ca/programming/darknet_faq/#darknet_annotations

I recommend DarkMark to review your annotations before you train. https://www.ccoderun.ca/darkmark/Summary.html#DarkMarkReview You'll immediately know if something is wrong.

Bacus96 commented 3 years ago

Thanks for the response, that's very helpful. Would images need to be resized to the same size before creating these Bounding Boxes?

I posted partial code before, but here's the full script I use:

` import pandas as pd import os.path import csv from tqdm import tqdm import numpy as np

IMAGE_DIR = "Labels/"

def convert(filename_str, coords): os.chdir("..") image = cv2.imread(filename_str + ".jpg") coords[2] -= coords[0] coords[3] -= coords[1] x_diff = int(coords[2]/2) y_diff = int(coords[3]/2) coords[0] = coords[0]+x_diff coords[1] = coords[1]+y_diff coords[0] /= int(image.shape[1]) coords[1] /= int(image.shape[0]) coords[2] /= int(image.shape[1]) coords[3] /= int(image.shape[0]) os.chdir("Label") return coords

classes_coded = [0, 1, 2] classes_names = ["Unknown","Person","Car"]

input_file = csv.DictReader(open("open_images_miap_boxes_train.csv"))

for line in tqdm(list(input_file)): coords = np.asarray([float(line['XMin']), float(line['YMin']), float(line['XMax']), float(line['YMax'])]) filename_str = (str(line["ImageID"])+".jpg") coords = convert(filename_str, coords) with open('Labels/%s.txt'%line['ImageID'],'w') as f: f.write(','.join([str(classes_names.index(line['GenderPresentation'])),coords[0], coords[1], coords[2], coords[3]])+'\n') f.write(','.join([str(classes_names.index(line['GenderPresentation'])), str(coords[0]), str(coords[1]), str(coords[2]), str(coords[3]])+ '\n')

break

`

agjunyent commented 3 years ago

To convert from MinMax coord to Yolo format, the easiest way is: