Support for user-defined adversarial criteria in black-box evasion attacks

beat-buesser commented 3 years ago

Currently, most black-box evasion attacks (except SquareAttack from ART 1.7.0 with #1127 provided by @jacobalanbond) contain classification specific code to check if an example candidate is adversarial. We would like to investigate and if possible implement user-defined adversarial criteria following #1127 for all or most black-box evasion attacks. This should make the black-box evasion attack independent from the machine learning task (e.g classification) and provide support for a large number other tasks (e.g. lane detection, segmentation, etc.). We are planning to define a new API for adversarial criteria and extend ART's estimator API with an adversarial criterium property and implement it for the existing task- and model-specific estimators were possible and apply it in the black-box evasion attacks.

beat-buesser commented 3 years ago

We should also consider BlackBoxEstimator for this issue, similar to BlackBoxClassifier

moohax commented 2 years ago

We use an overridable function in the estimator (our target) https://github.com/Azure/counterfit/blob/dae55c29e9f27ac5d9a99a280bdc23f2a4b26bbd/counterfit/core/targets.py#L179

It's not perfect, and it only works for evasion.

beat-buesser commented 2 years ago

I think this could be an interesting approach. I'm wondering if there is a good pattern to automate checking the return format of the overwritten function.

moohax commented 2 years ago

Perhaps a functools.partial?

beat-buesser commented 2 years ago

Interesting, do you mean in combination with inspect.signature?

Trusted-AI / adversarial-robustness-toolbox

Support for user-defined adversarial criteria in black-box evasion attacks #1134