Runjian-Chen / CO3

MIT License
29 stars 2 forks source link

Question about results with BYOL and STRL #3

Closed liuQuan98 closed 1 year ago

liuQuan98 commented 1 year ago

Hi Chen!

Thank you again for your excellent paper! I am quite curious that STRL is reported to have a much higher score than BYOL in your paper (according to Table 1 & 2) which seemed quite amazing to me at first, as STRL is basically BYOL but extended to the 3D realm with little modification.

I guess it is because BYOL is trained following the point-wise objective (akin to your Cooperative Contrastive Objective but without negatives) while STRL is trained with a global objective (i.e., predict the global feature vector for a corresponding view). It seems counter-intuitive that a finer-grained pretraining target reaps worse results.

I am not posing questions on the numbers though, I will be pretty glad anyway should you be kind enough to open-source the baseline methods that you implemented. I am more curious about the high-level implications. Does this imply that point-wise contrastive learning (aka registration) is not a good pretraining objective even compared with a global objective?

Runjian-Chen commented 1 year ago

Hello Liu Quan,

Thank you for your interest in our work and the meaningful question. For the baselines, we implement them on Huawei's server and it is confidential. I am sorry that I am not able to open-source them. One potential explanation to the phenomenon is that BYOL are conducted on views that built with linear augmentation on point clouds, which are not suitable for contrastive-based methods. Meanwhile STRL builds views with samples at different timestamps, which might be more meaningful for pre-training the backbone. These reasons might result in the score difference.

However, in my opinion, point-level pre-training goal should be better than global pre-text task, especially for fine-grained downstream tasks like detection and segmentation.

Best, Runjian