DIUx-xView / xview3-reference

Reference data processing code and model for the xView3 prize challenge.
45 stars 28 forks source link

Error in provided chipping/labeling scheme #15

Open laserstonewall opened 2 years ago

laserstonewall commented 2 years ago

I believe there may be an error in the reference implementation, in the __init__() function for the XView3Dataset class in dataloader.py that causes a significant number of samples to incorrectly be labeled as from class 1/FISHING.

The results hold for both the tiny and full datasets. If we look at the chipping annotations csv generated for the tiny validation set:

tiny_valid_chips = pd.read_csv('/home/ubuntu/xview3/process_uuid/validation/val_chip_annotations.csv')

print(tiny_valid_chips[['scene_id', 'chip_index', 'is_vessel', 'is_fishing', 'confidence', 'vessel_class']].head())

# Output:

            scene_id  chip_index is_vessel is_fishing confidence  vessel_class
0  b1844cde847a3942v         333      True        NaN     MEDIUM             1
1  b1844cde847a3942v         303      True        NaN        LOW             1
2  b1844cde847a3942v         824     False        NaN       HIGH             3
3  b1844cde847a3942v         404      True        NaN     MEDIUM             1
4  b1844cde847a3942v         404     False        NaN       HIGH             3

We can see in the first row, is_vessel is True, is_fishing is NaN. This should result in a label of 2/NONFISHING, however the label ends up with 1/FISHING. If we do a grouping (filling in NaN values, which pandas will drop if they are in the groupby keys), we get:

groupby_cols = ['is_vessel', 'is_fishing', 'confidence', 'vessel_class']

tst = tiny_valid_chips[groupby_cols]
tst = tst.fillna('Null')


# Output

is_vessel  is_fishing  confidence  vessel_class
False      Null        HIGH        3               400
                       MEDIUM      3                26
True       False       HIGH        2               272
                       MEDIUM      2                 3
           True        HIGH        1                19
           Null        LOW         1               329
                       MEDIUM      1               303
Null       Null        LOW         1                10                                                                                                      

So we can see that cases where is_vessel is True and is_fishing is NaN are always labeled as 1/FISHING. This occurs for both LOW and MEDIUM confidence labels in the tiny set. Additionally, cases where both is_vessel and is_fishing are NaN are also labeled as 1/FISHING.

If we do the same analysis for the chipping annotations csv generated for the tiny training set:

tiny_train_chips = pd.read_csv('/home/ubuntu/xview3/process_uuid/train/train_chip_annotations.csv')

groupby_cols = ['is_vessel', 'is_fishing', 'confidence', 'vessel_class']

tst = tiny_train_chips[groupby_cols]
tst = tst.fillna('Null')


# Output

is_vessel  is_fishing  confidence  vessel_class
-1         -1          -1          0               260
False      Null        HIGH        3                34
Null       Null        LOW         1               111
True       False       MEDIUM      2               184
           True        MEDIUM      1               194

We can see that cases where is_vessel is True and is_fishing is NaN don't occur in this set. However, cases where both columns are NaN do, and they are labeled as 1/FISHING.

The issue seems to be in the loop in lines 271 - 277:

self.detections = pd.read_csv(detect_file, low_memory=False)
vessel_class = []
for ii, row in self.detections.iterrows():
    if row.is_vessel and row.is_fishing:
    elif row.is_vessel and not row.is_fishing:
    elif not row.is_vessel:

The first conditional statement,

if row.is_vessel and row.is_fishing:

is meant to test if both is_vessel and is_fishing are True. However, the Pandas NaN will also evaluate to True. Here is an example data point:

tst = tiny_valid_chips[(tiny_valid_chips['is_vessel']==True) & (tiny_valid_chips['is_fishing'].isnull())]
example = tst.iloc[0]

print(f"Detect ID: {example['detect_id']}")
print(example[['is_vessel', 'is_fishing', 'vessel_class']])

if example['is_vessel']:
    print('Test 1')

if example['is_fishing']:
    print('Test 2')

if not np.isnan(example['is_fishing']):
    print('Test 3')

# Output

Detect ID: b1844cde847a3942v_006.46879283500000035190_003.47593584100000008164
is_vessel       True
is_fishing       NaN
vessel_class       1
Name: 0, dtype: object
Test 1
Test 2

So the first conditional statement will classify is_vessel/is_fishing combinations of True/True, NaN/NaN, True/NaN all as class 1/FISHING.

For a dataset with a random subset of 300 of the training scenes chipped, the label distribution looks like:

is_vessel  is_fishing  confidence  vessel_class
False      Null        HIGH        3               5418
                       MEDIUM      3               1730
True       False       HIGH        2               2868
                       MEDIUM      2                 62
           True        HIGH        1                933
                       MEDIUM      1                 28
           Null        LOW         1               3315
                       MEDIUM      1               4751
Null       Null        LOW         1                119

So there end up with (4751 + 3315 + 119) = 8185 of the 19224 detections labeled as 1, seemingly incorrectly.