aimotive / aimotive_dataset

aiMotive public dataset
https://openreview.net/forum?id=LW3bRLlY-SA
Other
46 stars 2 forks source link

Classes in train and val splits #7

Open ArseniuML opened 1 month ago

ArseniuML commented 1 month ago

I tried to investigate, object of what types are annotated in train and val splits:

train_dataset = AiMotiveDataset(data_root, 'train')
train_classes = set()
for anno in train_dataset.dataset_index:
    with open(anno, 'r') as f:
        j = json.load(f)
    train_classes = train_classes.union(set([obj['ObjectType'] for obj in j['CapturedObjects']]))

train_classes

{'BICYCLE',
 'BUS',
 'CAR',
 'MOTORCYCLE',
 'PEDESTRIAN',
 'RIDER',
 'TRAIN',
 'TRUCK'
val_dataset = AiMotiveDataset(data_root, 'val')
val_classes = set()
for anno in val_dataset.dataset_index:
    with open(anno, 'r') as f:
        j = json.load(f)
    val_classes = val_classes.union(set([obj['ObjectType'] for obj in j['CapturedObjects']]))

val_classes

{'BICYCLE',
 'BUS',
 'CAR',
 'MOTORCYCLE',
 'OTHER-OBJECT',
 'OTHER-RIDEABLE',
 'PEDESTRIAN',
 'PICKUP',
 'SHOPPING-CART',
 'TRAILER',
 'TRUCK',
 'VAN'}

Why are there classes in the validation split that are not in the training split (VAN, TRAILER, SHOPPING-CART, PICKUP)? Why are there classes in the train split that are not in the val split (TRAIN, RIDER)?

TamasMatuszka commented 1 month ago

Hi @ArseniuML,

The reason for the discrepancy between train/val classes can be explained by the fact that the training set was generated by an automatic annotation method with limited output classes while the val set was created by manual annotators with a broader classification set.

You are right that classes included in the validation set but not in the training set cannot be detected if a supervised model is trained on the dataset. However, in this way, open vocabulary methods can also be benchmarked using the aiMotive dataset.