ScikitlearnLogisticRegression, suspicious behavior of loss_gradient

marcinz99 commented 3 years ago

Hi, me and my colleagues have been working on a contribution to the ART library (#1063), when we have spotted some unexpected behavior of already existing ART's code that made us suspicious.

Bug description:

TL;DR; ScikitlearnLogisticRegression loss_gradient implementation (LossGradientMixin) presumably produces wrong values. Its behavior seems to be inconsistent with the results of the same mixin method in other classifiers.

System information (please complete the following information):

OS - Windows 10
Python version - 3.7.10
ART version or commit number - 1.6.1
PyTorch version - 1.3.1

More information:

We are working on an evasion attack aiming imperceptible attacks on tabular data. We have to rely on the LossGradientMixin, because we need to compute the gradient of the loss function with regard to the data vector. This mixin provides exactly what we need and several classifiers implements it. Surprisingly in case of the scikit-learn logistic regression wrapper, loss_gradient method happened to yield nonsensical results, even though other tested wrappers work fine.

We have tested it with following settings:

LogisticRegression (sklearn) wrapped by ScikitlearnLogisticRegression (ART),
LogisticRegression (sklearn) wrapped by our own wrapper using analytically derived loss_gradient function,
Small PyTorch neural network wrapped by PyTorchClassifier (ART),
SVC (sklearn) wrapped by ScikitlearnSVC (ART),

and following datasets:

Iris flowers,
Breast cancer.

This totals at 8 cases. The only issues occur with ScikitlearnLogisticRegression. The remaining ones are working fine.

You can also see an example of loss_gradient value on 3 exemplary 4-feature records (based on runs on iris dataset).

LogisticRegression (sklearn) wrapped by ScikitlearnLogisticRegression (ART)

array([[ -9.54787896,  11.60941639, -13.09170357, -12.05989197],
   [  7.68038968, -14.4471873 ,  27.39639621,  25.84401514],
   [  1.86196845,   2.82938166, -14.26240392, -13.74337339]])

LogisticRegression (sklearn) wrapped by our own wrapper using analytically derived loss_gradient function

array([[ 0.62249644,  0.94592366, -4.76823099, -4.59470782],
   [-3.18262635,  3.86980542, -4.36390098, -4.01996379],
   [ 2.56105005, -4.81433087,  9.12508385,  8.60787998]])

Small PyTorch neural network wrapped by PyTorchClassifier (ART)

array([[ 0.1291025 , -0.07950428, -0.00818079, -0.31776088],
   [-0.1569674 ,  0.13731143, -0.227911  , -0.04674948],
   [ 0.11480472, -0.06161943,  0.186488  ,  0.6237758 ]])

SVC (sklearn) wrapped by ScikitlearnSVC (ART)

array([[ 0.22317505,  0.5538472 , -1.0759461 , -1.3940033 ],
   [-0.34873635,  0.27785617, -0.71964175, -0.57715654],
   [-1.8206629 , -1.2868294 , -0.28024283, -1.6403873 ]])

Apart from the first (problematic) one, the other appear to exhibit some rational similarities in the loss_gradient results. The very first one though has not even the signs matching - rather decent sanity check showing that something is off.

Attachments: logistic_regression_wrapper_issue_notebooks.zip

logistic_regression_wrapper_issue_iris.ipynb - Notebook with several wrappers tested using our attack. To run it, one needs to import the LowProFool class. It is still an open pull request though (#1063). You may still, however, compare the results of loss_gradient itself. Here you can also find implementation of our wrapper based on analytical calculations and the results shown above.
logistic_regression_wrapper_issue_cancer.ipynb - Same as above, but Iris dataset is replaced by breast cancer dataset.

logisticregression-_entropy_gradient_wrt_data_vector

logisticregression-_entropy_gradient_wrt_data_vector.jpg - Analytical calculations used for our, apparently working, implementation. Photo of hand-written notes.

Loss gradient of our wrapper for logistic regression based on our analytical derivations (which seem to be working fine):

    def loss_gradient(self, x: np.ndarray, y: np.ndarray, **kwargs) -> np.ndarray:
        """
        Compute the gradient of the loss function w.r.t. `x`.

        :param x: Sample input with shape as expected by the model.
        :param y: Target values (class labels) one-hot-encoded of shape `(nb_samples, nb_classes)`.
        :param kwargs:
        :return: Array of gradients of the same shape as `x`.
        """
        x_probas = self.model.predict_proba(x)
        errors = x_probas - y
        Thetas = self.model.coef_

        if Thetas.shape[0] == 1:
            Thetas = np.append(Thetas * (-1), Thetas, axis=0)

        return (errors @ Thetas) / self.model.classes_.size

beat-buesser commented 3 years ago

Hi @marcinz99 Great to hear from you and thank you very much for notifying us of this suspicious behavior of ScikitlearnLogisticRegressionloss_gradient and sharing your detailed analysis! We'll take soon a closer look at this issue.

beat-buesser commented 3 years ago

Hi @marcinz99

I have been able to obtain identical loss gradients from PyTorch and scikit for a Logistic Regression model with the same weights and biases using your code for loss gradient calculation with this notebook: logistic_regression_wrapper_issue_iris.ipynb.zip

I have added your code for loss gradient calculation to PR #1065 to release it with ART 1.6.2.

Trusted-AI / adversarial-robustness-toolbox

ScikitlearnLogisticRegression, suspicious behavior of loss_gradient #1064