ch3cook-fdu / Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
MIT License
76 stars 5 forks source link

Why the 'nyu40id2class' of Vote2Cap is different with that of these detection methods? #8

Open linhaojia13 opened 8 months ago

linhaojia13 commented 8 months ago
## VoteNet
(Pdb) DC18.nyu40id2class
{3: 0, 4: 1, 5: 2, 6: 3, 7: 4, 8: 5, 9: 6, 10: 7, 11: 8, 12: 9, 14: 10, 16: 11, 24: 12, 28: 13, 33: 14, 34: 15, 36: 16, 39: 17}
(Pdb) len(DC18.nyu40id2class)
18
(Pdb) 

## Vote2Cap
(Pdb) self.dataset_config.nyu40id2class
{5: 2, 23: 17, 8: 5, 40: 17, 9: 6, 7: 4, 39: 17, 18: 17, 11: 8, 29: 17, 3: 0, 14: 10, 15: 17, 27: 17, 6: 3, 34: 15, 35: 17, 4: 1, 10: 7, 19: 17, 16: 11, 30: 17, 33: 14, 37: 17, 21: 17, 32: 17, 25: 17, 17: 17, 24: 12, 28: 13, 36: 16, 12: 9, 38: 17, 20: 17, 26: 17, 31: 17, 13: 17}
(Pdb) len(self.dataset_config.nyu40id2class)
37
(Pdb) 
ch3cook-fdu commented 8 months ago

The Scan2Cap task requires localizating instances of categories beyond the pre-defined 18 categories used in 3D detection.

To be specific, the categories in the origin VoteNet implementation are:

{
    'cabinet':0, 'bed':1, 'chair':2, 'sofa':3, 'table':4, 'door':5,
    'window':6,'bookshelf':7,'picture':8, 'counter':9, 'desk':10, 'curtain':11,
    'refrigerator':12, 'showercurtrain':13, 'toilet':14, 'sink':15, 'bathtub':16, 'garbagebin':17
}

However, both scanrefer and nr3d contain annotations on "shoes", "monitors", "tvs" that are common in those 3D environments.

We follow the same category definition as Scan2Cap.