Some questions about Oneformer3D

RayYoh commented 8 months ago

Hello authors, thanks for your great efforts, and congratulations for this work has been accepted by CVPR2024.

After reading the code, I have two questions about the implementations.

Instead of using superpoint features (here), have you ever tried directly use voxel features (or point features) to predict the masks? Does it hurt the final performance? Intuitively, it should achieve better results because of its finer-grained features compared to superpoint. But I have tried it based on my own code (using SPFormer as the baseline), and I found the final mAP is lower than SPFormer but AP50 and AP25 are higher (51.4 74.4 84.0 vs 51.4 74.4 84.0) on val set.
Another question is whether there are any specific reasons to choose Mask3D's pre-trained backbone as the backbone of ScanNet200 (In my opinion, ScanNet v2 and ScanNet200 have the same data but just different GT labels).

Looking for your reply.

Best, Ray

oneformer3d-contributor commented 8 months ago

Hi @RayYoh ,

We use superpoints on ScanNet and do not use on S3DIS. The main reason here is that gt annotations on ScanNet are given for the same set of superpoints, so using them boosts segmentation metrics.
The only reason is that at some point starting with Mask3D chekpoint on ScanNet200 brings slightly better metrics, compared to SSTNet checkpoint.

RayYoh commented 8 months ago

Hi @RayYoh ,

We use superpoints on ScanNet and do not use on S3DIS. The main reason here is that gt annotations on ScanNet are given for the same set of superpoints, so using them boosts segmentation metrics.

The only reason is that at some point starting with Mask3D chekpoint on ScanNet200 brings slightly better metrics, compared to SSTNet checkpoint.

Hi, oneformer3d contributors @oneformer3d-contributor. Thanks for your kind reply. I have understood Q2. But for Q1, during my training, I have changed the supervised signal to per point instance mask instead of superpoint mask in Oneformer3D (here), which means I didn't use scatter_mean and I remap the predicted voxel masks into point masks.

filaPro commented 8 months ago

I think we didn't try our method on ScanNet without superpoints. Only on S3DIS. And I'm not sure if training w/o superpoints can improve metrics on ScanNet...

filaPro / oneformer3d

Some questions about Oneformer3D #31