Closed mvnowak closed 6 months ago
Oh yeah, it should have sampled from $\mathbb{R}_{-}$ from the start! Thanks for looking into this, finding it and also pushing a fix.
30% is a bit higher than I would expect from a purely passive attack, though. Looking at the code and the notebook again after a long time, is this due to a skew introduced in a roundabout way from the bias offset mu
?
is this due to a skew introduced in a roundabout way from the bias offset mu?
That's possible, I'll look into it in more detail in the coming weeks and report back with my findings.
In the mean-time, I'll merge these fixes already!
The current implementation of the trap weight initialization is flawed, as it samples the negative weights using
torch.randn
, resulting in mixed positive and negative values. I fixed this by wrapping it withtorch.abs
and multiplying by -1 to ensure full negative weights, similar to how it is done in this implementation.Additionally, I changed the sign of the scale factor in the ImageNet example notebook, as it was negative, which lead to a double negative in the line
positive_samples = -scale_factor * sampled_weights
.The fact that the example notebook still reproduced images despite these flaws, can simply be attributed to the fact that even without adversarial weight initialization there is a chance for single-image neuron activations (see passive attack of the CAH paper).
With these adjustments, the example notebook reproduces ~50% of the training images, as opposed to ~30% previously.