Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.93k stars 1.17k forks source link

ScikitlearnLogisticRegression, suspicious behavior of loss_gradient #1064

Closed marcinz99 closed 3 years ago

marcinz99 commented 3 years ago

Hi, me and my colleagues have been working on a contribution to the ART library (#1063), when we have spotted some unexpected behavior of already existing ART's code that made us suspicious.

Bug description:

TL;DR; ScikitlearnLogisticRegression loss_gradient implementation (LossGradientMixin) presumably produces wrong values. Its behavior seems to be inconsistent with the results of the same mixin method in other classifiers.

System information (please complete the following information):

More information:

We are working on an evasion attack aiming imperceptible attacks on tabular data. We have to rely on the LossGradientMixin, because we need to compute the gradient of the loss function with regard to the data vector. This mixin provides exactly what we need and several classifiers implements it. Surprisingly in case of the scikit-learn logistic regression wrapper, loss_gradient method happened to yield nonsensical results, even though other tested wrappers work fine.

We have tested it with following settings:

and following datasets:

This totals at 8 cases. The only issues occur with ScikitlearnLogisticRegression. The remaining ones are working fine.

You can also see an example of loss_gradient value on 3 exemplary 4-feature records (based on runs on iris dataset).

Apart from the first (problematic) one, the other appear to exhibit some rational similarities in the loss_gradient results. The very first one though has not even the signs matching - rather decent sanity check showing that something is off.

Attachments: logistic_regression_wrapper_issue_notebooks.zip

logisticregression-_entropy_gradient_wrt_data_vector

Loss gradient of our wrapper for logistic regression based on our analytical derivations (which seem to be working fine):

    def loss_gradient(self, x: np.ndarray, y: np.ndarray, **kwargs) -> np.ndarray:
        """
        Compute the gradient of the loss function w.r.t. `x`.

        :param x: Sample input with shape as expected by the model.
        :param y: Target values (class labels) one-hot-encoded of shape `(nb_samples, nb_classes)`.
        :param kwargs:
        :return: Array of gradients of the same shape as `x`.
        """
        x_probas = self.model.predict_proba(x)
        errors = x_probas - y
        Thetas = self.model.coef_

        if Thetas.shape[0] == 1:
            Thetas = np.append(Thetas * (-1), Thetas, axis=0)

        return (errors @ Thetas) / self.model.classes_.size
beat-buesser commented 3 years ago

Hi @marcinz99 Great to hear from you and thank you very much for notifying us of this suspicious behavior of ScikitlearnLogisticRegressionloss_gradient and sharing your detailed analysis! We'll take soon a closer look at this issue.

beat-buesser commented 3 years ago

Hi @marcinz99

I have been able to obtain identical loss gradients from PyTorch and scikit for a Logistic Regression model with the same weights and biases using your code for loss gradient calculation with this notebook: logistic_regression_wrapper_issue_iris.ipynb.zip

I have added your code for loss gradient calculation to PR #1065 to release it with ART 1.6.2.