Fix CAH trap weight initalization and image net example scale factor

mvnowak commented 6 months ago

The current implementation of the trap weight initialization is flawed, as it samples the negative weights using torch.randn, resulting in mixed positive and negative values. I fixed this by wrapping it with torch.abs and multiplying by -1 to ensure full negative weights, similar to how it is done in this implementation.

Additionally, I changed the sign of the scale factor in the ImageNet example notebook, as it was negative, which lead to a double negative in the line positive_samples = -scale_factor * sampled_weights.

The fact that the example notebook still reproduced images despite these flaws, can simply be attributed to the fact that even without adversarial weight initialization there is a chance for single-image neuron activations (see passive attack of the CAH paper).

With these adjustments, the example notebook reproduces ~50% of the training images, as opposed to ~30% previously.

JonasGeiping commented 6 months ago

Oh yeah, it should have sampled from $\mathbb{R}_{-}$ from the start! Thanks for looking into this, finding it and also pushing a fix.

30% is a bit higher than I would expect from a purely passive attack, though. Looking at the code and the notebook again after a long time, is this due to a skew introduced in a roundabout way from the bias offset mu?

mvnowak commented 6 months ago

is this due to a skew introduced in a roundabout way from the bias offset mu?

That's possible, I'll look into it in more detail in the coming weeks and report back with my findings.

JonasGeiping commented 6 months ago

In the mean-time, I'll merge these fixes already!

JonasGeiping / breaching

Fix CAH trap weight initalization and image net example scale factor #15