Linkerbrain / fact-ai-2021-project

2 stars 0 forks source link

Attack the gradients with knowledge over used transformations #3

Open ole2252 opened 2 years ago

alvitawa commented 2 years ago

This has a few components, first one should analyze the distribution of images after certain (sequences of) transforms. The easiest way to do this is to generate random images, transform them and then calculating the first n moments of the pixel distribution of the transformed outputs (like the mean and others). Instead of random images, one can also use real images from the dataset or a similar dataset, but we should not assume the attacker has access to this kind of information. Either way we can make a loss term through moment matching (square distance of expected moments and actual moments) which is added to the gradient similarity.

A more complex but cooler way is to train a neural network to recognize what augmentations have been applied to an image which gives a low score if an image is likely to have been augmented by the known augmentations. Simply add the output of this network for the image reconstruction attempt to the gradient similarity.

alvitawa commented 2 years ago

The notion that augmentations are fundamentally a way to make the model invariant to some aspect (i.e. rotation) provides some insight to why these augmentations make reconstruction harder. Namely, if a model is invariant to x, it will provide the same output regardless of the variations of x that are applied to an imput image. Thus it would make it impossible to know the x variation that was applied to the input image. However, note that what is used for the reconstruction is not the output of the network but it's gradients, which will not be x invariant. However, they might be increasingly invariant as the layers of the network progress.

If this hypothesis about increasing invariance is true, it might mean that the gradients of the first layers are more usefull for reconstructing the augmented image.

alvitawa commented 2 years ago

Preliminary results:

image