Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.87k stars 1.16k forks source link

Support for user-defined adversarial criteria in black-box evasion attacks #1134

Open beat-buesser opened 3 years ago

beat-buesser commented 3 years ago

Currently, most black-box evasion attacks (except SquareAttack from ART 1.7.0 with #1127 provided by @jacobalanbond) contain classification specific code to check if an example candidate is adversarial. We would like to investigate and if possible implement user-defined adversarial criteria following #1127 for all or most black-box evasion attacks. This should make the black-box evasion attack independent from the machine learning task (e.g classification) and provide support for a large number other tasks (e.g. lane detection, segmentation, etc.). We are planning to define a new API for adversarial criteria and extend ART's estimator API with an adversarial criterium property and implement it for the existing task- and model-specific estimators were possible and apply it in the black-box evasion attacks.

beat-buesser commented 3 years ago

We should also consider BlackBoxEstimator for this issue, similar to BlackBoxClassifier

moohax commented 2 years ago

We use an overridable function in the estimator (our target) https://github.com/Azure/counterfit/blob/dae55c29e9f27ac5d9a99a280bdc23f2a4b26bbd/counterfit/core/targets.py#L179

It's not perfect, and it only works for evasion.

beat-buesser commented 2 years ago

I think this could be an interesting approach. I'm wondering if there is a good pattern to automate checking the return format of the overwritten function.

moohax commented 2 years ago

Perhaps a functools.partial?

beat-buesser commented 2 years ago

Interesting, do you mean in combination with inspect.signature?