image missing in train set jpg data dgta-seadronesee

David0tt / DeepGTAV

A system to easily extract ground truth training data for different machine learning tasks from GTAV

GNU General Public License v3.0

93 stars 9 forks source link

image missing in train set jpg data dgta-seadronesee #4

Closed fabiopoiesi closed 2 years ago

fabiopoiesi commented 2 years ago

Hi and thanks for sharing this useful project. Great job!

I've downloaded the dgta-seadronesee data (jpg folder + labels + meta data) and unzipped the files. Please check the size in the respective directories.

Train Screenshot from 2022-05-23 09-44-29

Val Screenshot from 2022-05-23 09-44-40

The images directory of the train set is one item short.

Can you please check which file is missing?

I am going to create a script to convert the annotations in coco format, I guess I have to resize the original bboxes if I want to use them with the jpg images, right?

C-der-Baum commented 2 years ago

I'll have a look. I'll get back to you in a couple of hours.

Ben93kie commented 2 years ago

I'm currently uploading an updated version of the train part. Please let me know if there are still any issues, such as a missing image. You are right, you have to resize the original bounding boxes to 1/2 to use them with the jpg images.

fabiopoiesi commented 2 years ago

Thank you.

On Mon, 23 May 2022 at 22:16, Ben93kie @.***> wrote:

I'm currently uploading an updated version of the train part. Please let me know if there are still any issues, such as a missing image. You are right, you have to resize the original bounding boxes to 1/2 to use them with the jpg images.

— Reply to this email directly, view it on GitHub https://github.com/Eisbaer8/DeepGTAV/issues/4#issuecomment-1135098290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBXWKD672ZWCIWHIQ5BU73VLPRTPANCNFSM5WU42U6A . You are receiving this because you authored the thread.Message ID: @.***>

fabiopoiesi commented 2 years ago

Have you had chance to upload the dataset?

Ben93kie commented 2 years ago

Unfortunately, there was a connection issue and it got canceled. Uploading it again, it is at 10% and should be finished in 3-4 hours.

Ben93kie commented 2 years ago

I just rechecked and I have 90k images in that folder actually. I'm still uploading a new version which may resolve the problem of a potentially corrupt file. Can you send me the name of the missing or corrupt file (by comparing to the label files) if the new version does not resolve problems?

I also uploaded label-jsons that should be in coco format fyi.

fabiopoiesi commented 2 years ago

By running this code I obtained the following id:

fns_labels = glob.glob(os.path.join(source_dir, 'train', 'labels', '*txt'))
fns_images = glob.glob(os.path.join(source_dir, 'train', 'images', '*jpg'))
image_id_images = [os.path.basename(fn).split('.')[0] for fn in fns_images]
for fn in fns_labels:
    image_id = os.path.basename(fn).split('.')[0]
    if not image_id in image_id_images:
        print(image_id)

id -> 0020_0000168908

fabiopoiesi commented 2 years ago

if you share that image I can just add it to the dataset, so I don't have to re-download it

C-der-Baum commented 2 years ago

0020_0000168908

fabiopoiesi commented 2 years ago

Thanks you!