How to compute the cosine similarity for different architectures?

facebookresearch / radioactive_data

This technique modifies image data so that any model trained on it will bear an identifiable mark.

Other

38 stars 9 forks source link

How to compute the cosine similarity for different architectures? #6

Closed zzzucf closed 2 years ago

zzzucf commented 2 years ago

Hi, from the Section 5.4, Architecture transfer in the original paper, Table 3 was presented with results from different architectures. However, ResNet-50, Densenet-121, and VGG 16 have different feature size as 2048, 1024, 4096, respectively, how did you compute the cosine similarity when the features' size are different? And why we need p-value instead of the angle from cosine? Shouldn't the degree of angle be more direct?

alexandresablayrolles commented 2 years ago

Great question! In these cases, since you do alignment of the features, you would first learn an alignment matrix that maps from e.g. 2048 to 512, apply this alignment to the classifier vectors, and then compute the cosine similarity in the 512 dimensional space.

zzzucf commented 2 years ago

I think of what you proposed before and what concerns me the most is that the transformation M would not have unique solution since it is not dxd anymore (dxd', d>>d'). In that case, there exists multiple transformations which might represent completely different directions to transform the classifier. So how can you explain the radioactive would still work when there exists multiple transformed classifier WMs?

alexandresablayrolles commented 2 years ago

Given that it's in the "good" direction (i.e. reducing from d'>d to d), I don't expect this to be a problem.