How to interprete annotation file values in Object Detection in Videos task?

Haebuk commented 2 years ago

I downloaded Task 2 dataset and unzipped it, then i got the annotation files and the format like below:

1,0,593,43,174,190,0,0,0,0
2,0,592,43,174,189,0,0,0,0
3,0,592,43,174,189,0,0,0,0
4,0,592,43,174,189,0,0,0,0
5,0,592,43,174,189,0,0,0,0
...

I found below description,

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>         The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>      The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>        The width in pixels of the predicted object bounding box

<bbox_height>        The height in pixels of the predicted object bounding box

   <score>       The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.

<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))

<truncation>         The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

<occlusion>      The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

But I think the description is quite different video annotation. how to interprete this? thank you.

DiegoLigtenberg commented 2 years ago

do you already have an answer? I'm desparetely trying to make this file format work but I just don't undertsand

Haebuk commented 2 years ago

@DiegoLigtenberg Not yet :(

Haebuk commented 2 years ago

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name                                                          Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
                  relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>       The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>         The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
                  evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
                  others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                  (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
                  (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
                  and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

RoyCopter commented 2 years ago

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name                                                        Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
                relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>         The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>       The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
                evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
                others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>         The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
                (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
                and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Do you have any idea how to convert it to yolov5 annotations?

saadhimmi commented 2 years ago

@RoyCopter, you can write a simple script:

For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (pd.unique or np.unique) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widght w and height h ----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns by w --------Divide bbox_center_y and bbox_height columns by h --------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

RoyCopter commented 2 years ago

@RoyCopter, you can write a simple script:

For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (pd.unique or np.unique) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widght w and height h ----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns by w --------Divide bbox_center_y and bbox_height columns by h --------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

Thanks!

ganesh0074 commented 1 year ago

its available in Visdrone.yaml-

fatbringer commented 1 year ago

How should i do it if i want to display the bounding box and also the target's annotation IDs ?

ganesh0074 commented 1 year ago

you need convert the given annotations into required format to het BBOX , there is function which converts annotations into correct one

fatbringer commented 1 year ago

@Ganyesh ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset

ganesh0074 commented 1 year ago

ad

python train.py --data VisDrone.yaml --epochs 300 --weights '' --cfg yolov5n.yaml --batch-size 128

Regards, Ganesh Gulhane Mob. no.-7276158110

On Tue, Jun 6, 2023 at 6:44 PM ykn96 @.***> wrote:

@Ganyesh https://github.com/Ganyesh ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset

— Reply to this email directly, view it on GitHub https://github.com/VisDrone/VisDrone-Dataset/issues/22#issuecomment-1579730002, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EXN5MO2RRM6YWHSTKPNF3XJ7MJFANCNFSM5FZ4IHRQ . You are receiving this because you were mentioned.Message ID: @.***>

ganesh0074 commented 1 year ago

@fatbringer are you able to get inti it?

fatbringer commented 1 year ago

Hi @Ganyesh thanks for checking in Yes i have solved it. turns out the correct sequence is frame_no, ID, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu

VisDrone / VisDrone-Dataset

How to interprete annotation file values in Object Detection in Videos task? #22