Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.78k stars 1.16k forks source link

SklearnClassifier produces probabilities instead of logits for DeepFool #826

Closed zacps closed 3 years ago

zacps commented 3 years ago

Describe the bug Running DeepFool on a SklearnClassifier results in probabilities instead of logits being provided to DeepFool.

To Reproduce Pseudocode only

X, y = ...
classifier = SklearnClassifier(model=SVC(kernel='linear', probability=True))
classifier.fit(X, y)
attack = DeepFool(classifier=classifier)

adversarial_examples = attack.generate(X)

Results in the warning:

It seems that the attacked model is predicting probabilities. DeepFool expects logits as model output to achieve its full attack strength.

Expected behavior The SklearnClassifier should use predict_log_proba instead of predict_proba when providing probability estimates to the deepfool attack.

System information:

beat-buesser commented 3 years ago

Hi @zacps Thank you for raising this interesting issue.

So far the attacks don't modify the output to allow a user to run the attack with probabilities or logits, independent of the attack's preference. But I see your point, and I would propose that we add a new argument to SklearnClassifier called use_logits (following other ART estimators) that would define if SklearnClassifier.predict uses predict_log_proba or predict_proba. What do you think?

Would you be interested to implement a solution for one of ART's next releases?

zacps commented 3 years ago

Would it be better to add predict_log_proba method to the SklearnClassifier? DeepFool could check for it and use it if it exists (like in scikitlearn.py#L164).

Yes, I should have time to implement something.

beat-buesser commented 3 years ago

That's great! Let me know anytime if you have questions.

I think your approach would work, but it would not allow a user to run the attack with probabilities for example for comparison purposes.

I think we should still go for a new argument use_logits as it would be more consistent with the overall pattern in ART to keep the model related settings on the estimator side and let the attack work with the output provided by the estimator. But, I think new checks like in scikitlearn.py#L164 would still be required in SklearnClassifier to check if the provided sklearn model provides predict_log_proba if use_logits=True.