Open chrischoy opened 4 years ago
Hi @chrischoy , thanks for your interest in our paper. (1) The results can indeed be different from the paper if the model is trained again. It can be better or worse. For example, the released model (trained after NeurIPS submission) for Area 5 is better than the paper, but it may also be worse as you reported. I guess the primary reason is the Hungarian algorithm which may bring about instability during training. It seems a more stable back-prop algorithm is worthwhile to be explored. (Btw, all network configurations are the same.)
(2) I do agree mAP is more general to measure the results of obj detection or ins segmentation. However, the reported mAP scores of the first paper SGPN are incorrect according to their released code, which is also pointed out in GSPN. For a fair comparison with the ASIS which was SoTA on S3DIS, we simply follow their mPrec/mRec protocol. For the benefit of the community, I strongly believe a standard mAP protocol and the correct implementation are quite important.
(3) Here are the per-category pre/rec scores of 3D-BoNet (6 fold cross-validation), but unfortunately the results for other baselines are no longer available.
-----------pre/rec------- ceiling: 0.8852/0.6180 floor: 0.8989/0.7464 wall: 0.6487/0.4999 beam: 0.4230/0.4217 column: 0.4801/0.2716 window: 0.9301/0.6242 door: 0.6676/0.5845 table: 0.5539/0.4861 chair: 0.7198/0.6158 sofa: 0.4972/0.2876 bookcase: 0.5830/0.2843 board: 0.8074/0.4648 clutter: 0.4762/0.2860
Dear Bo,
Thanks for sharing the code. It was pretty easy to run, but I have a few questions regarding the evaluation of the S3DIS.
I just trained your network from scratch simply by running
main_train.py
on S3DIS, but, I found out that the final evaluations on Area 5 were different from the ones reported on the paper, and wanted to know the cause of the discrepancy.In Table 3 of your paper https://arxiv.org/pdf/1906.01140.pdf, the Area 5 mPrec and mRecall are 57.5 and 40.2, but the final results that I got were 53.36 and 40.55 respectively.
Is the default variable different from the one you reported on the paper?
Second question is why did you use mPrec and mRecall separately? It is quite standard to use mAP which measures the unweighted class-wise average of the areas under the precision-recall curve for all classes. What're your thoughts on the evaluation metrics?
Finally, would it be possible for you to share the class-wise results of the reported baselines? I would prefer to compute mAP on all baselines and it would be nice if you could share the results that you got on S3DIS.
Thanks! Chris