HaoranTang / cl_spatial_inductive_bias

Official PyTorch code for ICCV 23' paper Contrastive Learning Relies More on Spatial Inductive Bias Than Supervised Learning: An Empirical Study
https://haorantang.github.io/cl_spatial_inductive_bias/
MIT License
4 stars 2 forks source link

Why to define the spatial inductive bias? #3

Closed Lazzzycoding closed 5 months ago

Lazzzycoding commented 8 months ago

Dear authors @mikuhatsune @immortalCO @HaoranTang

Thanks for your novel work!

I have several questions to ask and make clear. At first, why to define the spatial inductive bias in this paper as this paper mainly study the shuffled global and local patches with disturbed spatial structure and information, is this the reason? Furthermore, could this idea of shuffling the local and global spatial structure be transferred to 3D point clouds? Then, how to understand and define the spatial inductive bias? As it is known, contrastive learning could also be applied to 3D tasks now. Whether the spatial inductive bias could be transffered to understand in the 3D tasks in contrastive learning?

If the dataset is not disturbed, could the spatial inductive bias also work in contrastive learning loss?

Thanks a lot.

HaoranTang commented 5 months ago

Thanks for your interests! In the context of image data, we view the spatial inductive bias as the fixed inner positioning of different parts of an object. For example, for a normal four-wheel car, car doors are always between wheels regardless of the overall positioning of the car and the background. Such inductive bias will not be destroyed by CL data augmentations, and we hypothesize that CL exploits such information for representation learning. Hence, we develop shuffling methods to destroy the inner positioning (spatial inductive bias), empirically study this phenomenon, and explain/analyze the feature space from existing theory.

Personally I think it applies to 3D, e.g., given a point-clouds airplane, points of the plane body are normally between points of the two wings. Destroying such bias of the airplane can also hinders CL from learning its representations.

However, we consider such dependence of CL on spatial inductive bias as its property to learn, which is not necessarily a shortcoming to fix. The intuition of this work is to figure out what kind of information does CL use to learn. Sorry for the late reply, I hope this can help : )