Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.88k stars 1.17k forks source link

Adversarial evasion attacks on regression #509

Open FelixNeutatz opened 4 years ago

FelixNeutatz commented 4 years ago

Dear all,

I am a big fan of this library! Unfortunately, I am missing support for adversarial attacks on regression models. It seems that you already support regression: https://adversarial-robustness-toolbox.readthedocs.io/en/latest/modules/estimators/regression.html

However, I did not see any attacks that allow for regression models. Did I miss anything?

Potential Solution: According to Molloy et al. (http://www.research.ibm.com/labs/ireland/nemesis2018/pdf/tutorial.pdf), the application of gradient methods, such as FGM and BIM, appears straight-forward. Therefore, it would be great if you could extend support to regression models for these attacks because you already implemented them anyways :)

Best regards, Felix

beat-buesser commented 4 years ago

Hi @FelixNeutatz Thank you very much! We are always very happy to hear from fans of ART!

I agree on that it should be straight forward to attack regression with for example gradient descent attacks. We had planned to introduce formal estimators for regression with the new Estimator API in ART 1.3 but ultimately ran out of time before the release and only the empty mixin base class made it, but we are considering adding support with ART 1.4. Beyond implementing regression estimators I think it requires mainly testing the attacks to make sure that they don't contain implicit assumptions related to classification tasks.

If you are interested in contributing to ART, I think this could be an interesting project to get started, even if it is for a single framework to start with. We definitely could discuss approaches and would provide support.

FelixNeutatz commented 4 years ago

Hi @beat-buesser,

great to hear that you are already thinking in that direction.

In a first prototype, I used ScikitlearnLogisticRegression as a template to implement a quick-and-dirty ScikitlearnLinearRegression class and adjusted the loss_gradient function as follows:

gradients = np.zeros(x.shape)
y_pred = self._model.predict(X=x)

for i_sample in range(num_samples):
    gradients[i_sample, :] = 2 * x[i_sample] * (y[i_sample] - y_pred[i_sample])

When I try this on the Boston dataset, it does have a small effect:

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from fastsklearnfeature.test.test_robustness.FGM_Regression import FastGradientMethod
from fastsklearnfeature.test.test_robustness.LinearRegressionSKlearn import ScikitlearnLinearRegression
from sklearn.metrics import r2_score

diabetes = load_boston()

X = diabetes.data
y = diabetes.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

regressor = ScikitlearnLinearRegression(model=model)
attack = FastGradientMethod(estimator=regressor, eps=0.1, batch_size=1)
X_test_attacked = attack.generate(X_test, y_test)

y_test_attacked = model.predict(X_test_attacked)
y_test_pred = model.predict(X_test)

print('r2 score on original data: ' + str(r2_score(y_true=y_test, y_pred=y_test_pred)))
print('r2 score on corrupted data: ' + str(r2_score(y_true=y_test, y_pred=y_test_attacked)))

This program results in:

r2 score on original data: 0.7261570836552478
r2 score on corrupted data: 0.5736966697751471

Do you think that my gradient computation is correct?

Thank you for your help.

Best regards, Felix

beat-buesser commented 4 years ago

Hi @FelixNeutatz Based on the import paths I'm not completely sure about the details of the implementations of FastGradientMethod. Is it the one of ART? Is fastsklearnfeature one of your libraries?

A first test to validate the attack would be to check if the maximum absolute difference between X_test_attacked and X_test corresponds to eps of the FGSM attack with norm=np.inf.

FelixNeutatz commented 4 years ago

Hi @beat-buesser,

I just quickly implemented the needed classes in a quick-and-dirty fashion in my current repository. Here, you can find the corresponding classes:

FastGradientMethod ScikitlearnRegressor ScikitlearnLinearRegression

Example

Additionally, I added a check to measure maximum absolute difference:

norm_type = np.inf
perts_norm = la.norm((X_test_attacked - X_test).reshape(X_test.shape[0], -1), ord=norm_type, axis=1)
print('maximum absolute difference:' + str(perts_norm))

This check yields the following array for epsilon=0.1:

maximum absolute difference:[0.1000061  0.10001221 0.10000069 0.1000061  0.10000977 0.1000061
 0.1000061  0.1000061  0.10000153 0.10001221 0.10001221 0.1000061
 0.10000076 0.10001221 0.1000061  0.10000076 0.1000061  0.10000076
 0.1000061  0.10001221 0.10001221 0.10001221 0.1000061  0.1000061
 0.10000038 0.10000038 0.10000854 0.10000076 0.1000061  0.10001221
 0.1000061  0.10001221 0.10001221 0.10001465 0.10001465 0.1000061
 0.1000061  0.10001221 0.10001343 0.1000061  0.1000061  0.10001221
 0.1000061  0.10001221 0.10001343 0.10001343 0.1000061  0.1000061
 0.10001221 0.10001221 0.10001221 0.1000061  0.10001221 0.1000061
 0.1000061  0.10001099 0.10000366 0.10001221 0.10001221 0.10001099
 0.1000061  0.1000061  0.10000977 0.1000061  0.1000061  0.1000061
 0.10000038 0.10001221 0.1000061  0.10000076 0.10001221 0.10000038
 0.10000183 0.10001465 0.10001465 0.10001221 0.1000061  0.1000061
 0.10001221 0.10000061 0.10001221 0.10000076 0.10001343 0.10000076
 0.1000061  0.1000061  0.10000038 0.10001221 0.1000061  0.1000061
 0.1000061  0.10000076 0.10001221 0.1000061  0.10000076 0.1000061
 0.10000076 0.10000076 0.10000061 0.10000076 0.1000061  0.1000061
 0.10000153 0.1000061  0.1000061  0.10000076 0.1000061  0.10001343
 0.1000061  0.1000061  0.10000076 0.10000671 0.10001221 0.1000061
 0.1000061  0.10000122 0.1000061  0.1000061  0.1000061  0.1000061
 0.1000061  0.10000732 0.10000187 0.1000061  0.10001099 0.10001221
 0.10001343 0.1000061  0.1000061  0.1000061  0.10001465 0.10001465
 0.1000061  0.10001221 0.1000061  0.10000153 0.1000061  0.10001221
 0.1000061  0.1000061  0.1000061  0.1000061  0.10001221 0.1000061
 0.10000076 0.10001221 0.1000061  0.10001465 0.10001465 0.1000061
 0.10000038 0.10000038 0.1000061  0.1000061  0.10001221 0.10001221
 0.10000076 0.10000366 0.10001099 0.10000488 0.1000061  0.1000061
 0.1000061  0.10001221 0.10001221 0.1000061  0.1000061 ]
beat-buesser commented 4 years ago

Hi @FelixNeutatz I think these numbers make sense, it shows that the features have been changed by +/- 0.1 corresponding to the eps=0.1 and their sign should be equal to the loss gradient for each feature. You also have to compare eps with X, if X is not normalised the eps=0.1 might be quite small for some of the features in X while it is large for some of the other features.