VisDrone / VisDrone-Dataset

The dataset for drone based detection and tracking is released, including both image/video, and annotations.
1.27k stars 159 forks source link

How to understand the format in the annotation file? #2

Closed xincyu closed 1 year ago

xincyu commented 4 years ago

hello,The annotation file format : 684,8,273,116,0,0,0,0 ;

how to konw these number in someone format ? like this : x_min,y_min ,x_max,y_max or other detail

artynet commented 4 years ago

I still can't figure it out....it seems it is using the pixels instead of the usual boundaries. Any chance to convert it in the standard yolo format ?

dronefreak commented 4 years ago

Hi @xincyu and @artynet

DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>         The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>      The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>        The width in pixels of the predicted object bounding box

<bbox_height>        The height in pixels of the predicted object bounding box

   <score>       The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.

<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))

<truncation>         The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

<occlusion>      The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

adityachintala commented 1 year ago

Hey, we're trying to convert these to Yolo text format, any luck in converting from this format to that?

adityatandon commented 1 year ago

Hey @adityachintala, YOLOv5 has a script that allows quick conversion for the VisDrone format to the YOLO format. Check it out here.

Gareth1995 commented 1 year ago

Hey @adityatandon, do you know where I can find an explanation of what the script is doing? I see it's subtracting 1 from all classes, I thought this was to combine the person and pedestrian class, but when I train the model using this yaml I'm still getting results for pedestrian.

adityatandon commented 1 year ago

Hey @Gareth1995, I dont remember seeing a full explanation of the code anywhere. However, I can offer my two cents on what I've understood.

In the original annotations, the first 4 numbers represent the bounding box, the 5th number represents whether the annotation is ignored or considered and the 6th number represents the class label.

box = convert_box(img_size, tuple(map(int, row[:4]))) pulls the first 4 numbers for the bounding box and converts the bounding the box to the YOLO bounding box format.

cls = int(row[5]) - 1 reads the class label (number) associated at position 6 and subtracts 1 from the number since YOLO uses class labels in the range 0-9, while the original VisDrone annotations use class labels in the range 1-10.

This is my understanding of it, hope I was able to be of help.

miladnasiri commented 1 year ago

@Gareth1995 hi, did you train this dataset with yolo7? still, I am trying to convert the annotation to yolo7 format.

Gareth1995 commented 1 year ago

@miladnasiri I've only been working with yolov5 unfortunately.

miladnasiri commented 1 year ago

Thanks for reply Becuse format for yolo 5 and yolo 7 are equal , if I know how you( share with me the script to convert visdrone dataset to yolo 5 format) it is enough for me i can do the rest. I just can not convert dataset annotation to annotation suitable for training with yolo 5 or 7

Sent from my iPhone

On 25 Nov 2022, at 7:51 AM, Gareth1995 @.***> wrote:

 @miladnasiri I've only been working with yolov5 unfortunately.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

adityatandon commented 1 year ago

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

miladnasiri commented 1 year ago

@adityatandon hi , thanks for your help the first link is not working. did you also work with task 2 in visdrone dataset (video dataset )?

adityatandon commented 1 year ago

Hey @miladnasiri I just updated my earlier comment to ensure the link is working now. No, I only attempted the detection task with images and did not do the SOT and MOT tasks with videos.

miladnasiri commented 1 year ago

@adityatandon hi, can you please share with me the weight(training with yolo 7) ? this is my email miladnassiri92@gmail.com

nihanaltaytas commented 1 year ago

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

how did you use yaml file i am trying to use single object tracking

adityatandon commented 1 year ago

It was recently brought to my attention that the files I had uploaded with the annotations were incomplete. I've recently fixed these annotations and now if you're looking for a better way to convert the annotations with a cleaner script, I've made this available.

I've added the code to convert the annotations as well as the annotations in the YOLO format on my Github repository here. Feel free to use that for your experiments if you'd like.

EbubekirGONEY commented 1 year ago

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

Will I encounter any errors if I try it in YOLOV3?

EbubekirGONEY commented 1 year ago

@adityatandon Hey, can you help me?

Will I encounter any errors if I try it in YOLOV3?

bad-engineer commented 1 year ago

Hi @xincyu and @artynet

DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>       The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>        The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>      The width in pixels of the predicted object bounding box

<bbox_height>      The height in pixels of the predicted object bounding box

   <score>         The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.

<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))

<truncation>       The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

<occlusion>        The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

189,122,533,23,14,29,1,1,0,0 190,122,533,24,14,29,1,1,0,0

The MOT test dataset format looks like this. Does anyone know what it means? I think it is [frame, ID, bb_top_left_x, bbtop_left_y, width, height, conf, class_label, unknown, unknown]

but I can't find it online. Can anyone help?

Ainecop commented 1 year ago

Hi @xincyu and @artynet DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>         The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>      The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>        The width in pixels of the predicted object bounding box

<bbox_height>        The height in pixels of the predicted object bounding box

   <score>       The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.

<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))

<truncation>         The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

<occlusion>      The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

189,122,533,23,14,29,1,1,0,0 190,122,533,24,14,29,1,1,0,0

The MOT test dataset format looks like this. Does anyone know what it means? I think it is [frame, ID, bb_top_left_x, bbtop_left_y, width, height, conf, class_label, unknown, unknown]

but I can't find it online. Can anyone help?

Have you found the answer to it ?

Visdrones Video Detection dev- test set has following format 98 ,0 ,808 ,1 ,47 ,22 ,1 ,4 , 0, 0 I am not sure what it refers to .

fatbringer commented 10 months ago

Would the conversion code work for yolov8 ? And if i just want to retain only persons? WHich classes should I remove?