Closed rsokl closed 1 year ago
Great work! Thank you for introducing your fascinating work :) I will read the tutorials and then update the information about rai-toolbox. By the way, it is remarkable that rai-toolbox dramatically reduces the computational burden. Could you tell me the key point of reducing computational time?
Could you tell me the key point of reducing computational time?
Our approach is unique in two ways:
nn.Module
. E.g. when performing x + δ
to perturb an image, we have:class AdditivePerturbation(nn.Module):
def __init__(self, x):
self.delta = torch.Parameter(torch.zeros_like(x))
def forward(self, x):
return x + self.delta
thus we have direct access to the perturbation itself. Thus we differentiate w.r.t δ
and not x+δ
, and we do not have to perform (x + δ) - x
to access δ
, as other libraries do. For more details look here.
nn.Module
but via torch optimizers, which have access to δ
. This enables us to perform all normalization/projection/clamping within a no_grad
context and using strictly in-place operations. Thus the optimizer avoids unnecessary re-allocations of δ
that other libraries incur. You can read more about our optimizer-based approach here. (One really cool thing about our design is that you can easily compose & chain optimizers; e.g. you can easily do FGSM, but using sparse gradients and a AdamW optimizer for the update).Just wanted to bump this 😄
Sorry for the late response. I updated it in README.md! Awesome work :)
✨ Short description of the feature [tl;dr]
Hello! I wanted to bring your attention to a library that I am an author of: rai-toolbox, which also enables gradient-based perturbations, similar to torch-attack, foolbox, etc.
I have run a modified version of your benchmarks and hope you will include our library in your readme.
💬 Detailed motivation and codes
The rai-toolbox takes a unique approach to gradient-based perturbations: they are implemented in terms of parameter-transforming optimizers and perturbation models. This enables users to implement diverse algorithms (like universal perturbations and concept probing with sparse gradients) using the same paradigm as a standard PGD attack.
As you can see in the following table, our approach also enables us to be a bit more efficient too. The following times are measured per-step, so that the methods are more comparable (I also utilized a specialized timer that ensures more accurate timing of CUDA-dependent operations. feel free to user it in your benchmarks too).
(timed on a GeForce GTX 1080)
Model: Madry-Robust ResNet