Yang7879 / 3D-BoNet

🔥3D-BoNet in Tensorflow (NeurIPS 2019, Spotlight)
https://arxiv.org/abs/1906.01140
MIT License
390 stars 85 forks source link

Question about scannet #10

Open 519830100 opened 4 years ago

519830100 commented 4 years ago

Hi Bo, thanks for sharing the code. May I ask what's the gap between the performance between the val and test sets, so I can know the which position I am in. Another question is what if using pointnet++ for segmentation rather than sparse conv? Is this have a big influence on the final results?

Yang7879 commented 4 years ago

Hi @519830100, the score on val split is few points lower than the final score on test set. This is sensible as the val set has 300 scenes while the test split has only 100 scenes. I didn't systematically run full experiments on test/val sets using pointnet++ after I observed that the semantic prediction is really bad. There is no doubt that bad semantic results will ruin the final score because the evaluation metric is based on [semantic instance] segmentation. This metric requires the category of each instance to be correct at first, otherwise, any instance segmentation is wrong.

BigDeviltjj commented 4 years ago

Hi @Yang7879 , May I ask that where did you use sparse conv rahter than pointnet++ in your code? I only found that pointnet++ was used as backbone in main_3D_BoNet.py.

Yang7879 commented 4 years ago

Hi @BigDeviltjj, the sparse conv is used to predict semantics independently, instead of being integrated into our code. Here are easy steps to do the experiments.

https://github.com/Yang7879/3D-BoNet/issues/6#issuecomment-521896317

BigDeviltjj commented 4 years ago

@Yang7879 Thank you for your answer. By the way, inference bounding boxes and scores from one single global feature seems unreasonable for me(no offense), how do you come up with such an brilliant idea and achieve such a great result?

Yang7879 commented 4 years ago

hi @BigDeviltjj, this is a fundamental question regarding the proposed pipeline.

When looking at the raw 3D point clouds (with or without colors), we humans can easily identify clusters/subsets of point clouds as individual objects, even if at a single glance. Basically, we tend to use the general geometry or appreace information, such as point density, continuity/discontinuity, point colors etc., to roughly infer the boundary of those clusters.

When designing the network, we therefore firmly believe that the network should be powerful enough to capture the general information (not details) by a global feature vector. However, the challenge is how we design the loss functions to guide the network to capture those general but useful features. Eventually, the proposed multi-criteria loss functions aim to do so.

Overall, our pipeline tends to teach the network to roughly identify all existing objects at a single glance, like what we humans perceive the environment. This is the core difference with all existing work.

Thank you for your interests and we believe it's worthwhile and encourage the commnity to further dive deep and truly investigate this pipeline, and also explore the applicablity to 2D images (detection or ins segmentation).

zhixinwang commented 4 years ago

Hi, @Yang7879, thanks for sharing the code, could you tell me the exact resutls on validation set of ScanNet? Do you know why current most methods have not reported their results on validation set of ScanNet. I want to know the possible position on test leaderboard of scannet. Thanks in advance for any help yon can offer.

519830100 commented 4 years ago

@zhixinwang Hi, zhixin, do you mind share your results here, I am also doing the same task

96lives commented 4 years ago

@Yang7879 Do you have any ScanNet pretrained models? If you do, it would be really be appreciated

Yang7879 commented 4 years ago

@zhixinwang @519830100 @96lives We don't plan to release the code and model for ScanNet as it relies on the third-party SparseConv whose BSD License is not compatible with the MIT License we used.

Here are some more results of ScanNet validation split, they include both good and bad predictions, covering simple and complex scenes. You may need this for qualitative comparison. Due to time limitation, cannot process all data, thank you for understanding.

To download the results: https://drive.google.com/file/d/1cV07rP02Yi3Eu6GQxMR2buigNPJEvCq0/view

To visualize the results: python helper_data_scannet.py