DIUx-xView / xView1_baseline

Baseline models, scoring, and inference for the xView 2018 Challenge (i.e., xview1)
Apache License 2.0
76 stars 32 forks source link

Missing files, extra file and undefined type_ids. #3

Closed crikeli closed 6 years ago

crikeli commented 6 years ago

Hello, I have been doing some exploratory data analysis, and I have come to find three issues with the dataset. Two of the issues are not major. Firstly, the one I think is the most important one is that in the trn_images directory, there should be a '1395.tif' because according to the json, there are corresponding bounding box values.

The two non major issues are: 1) there are 3 extra files in train_images directory namely '._100.tif', '._102.tif' and '_109.tif' 2) According to the JSON file, there are two more 'type_id' values that are not provided in the 'xview_class_labels.txt' file. Those 2 values are 75 & 82. I tried plotting the bounding boxes and it looks like category 82 is just jibberish and category 75 seems to be random.

Looking forward to your reply.

dkust commented 6 years ago

Thanks for calling attention to these, we're investigating re: 1395.tif and classes 75 & 82. Re: the "underscore" files, they are just an artifact of filesystem attributes on MacOS and *nix systems, and you can safely ignore them.

dkust commented 6 years ago

It looks like classes 75 and 82 are artefacts of the labeling process, and do not correspond to any meaningful class that needs to be detected for this challenge. So you can either consider those two classes as "label noise" and have models handle them implicitly, or you can ignore those classes.

crikeli commented 6 years ago

Awesome. Thanks @dkust

crikeli commented 6 years ago

oops, I realized you are probably still investigating the 1395.tif image issue. Will leave this open for the time being.

dkust commented 6 years ago

Sorry for the dangling response re: 1395.tif...it is not in the training set, not sure why it appears in the json.