Closed liuQuan98 closed 1 year ago
Hello Liu Quan,
Thank you for your interest in our work and the meaningful question. For the baselines, we implement them on Huawei's server and it is confidential. I am sorry that I am not able to open-source them. One potential explanation to the phenomenon is that BYOL are conducted on views that built with linear augmentation on point clouds, which are not suitable for contrastive-based methods. Meanwhile STRL builds views with samples at different timestamps, which might be more meaningful for pre-training the backbone. These reasons might result in the score difference.
However, in my opinion, point-level pre-training goal should be better than global pre-text task, especially for fine-grained downstream tasks like detection and segmentation.
Best, Runjian
Hi Chen!
Thank you again for your excellent paper! I am quite curious that STRL is reported to have a much higher score than BYOL in your paper (according to Table 1 & 2) which seemed quite amazing to me at first, as STRL is basically BYOL but extended to the 3D realm with little modification.
I guess it is because BYOL is trained following the point-wise objective (akin to your Cooperative Contrastive Objective but without negatives) while STRL is trained with a global objective (i.e., predict the global feature vector for a corresponding view). It seems counter-intuitive that a finer-grained pretraining target reaps worse results.
I am not posing questions on the numbers though, I will be pretty glad anyway should you be kind enough to open-source the baseline methods that you implemented. I am more curious about the high-level implications. Does this imply that point-wise contrastive learning (aka registration) is not a good pretraining objective even compared with a global objective?