Azure / counterfit

a CLI that provides a generic automation layer for assessing the security of ML models
MIT License
799 stars 128 forks source link

Help importing Target #81

Open Barraque opened 1 year ago

Barraque commented 1 year ago

Hi all,

I am writing to you regarding an issue I have been facing while using the Counterfit framework. Firstly, I want to commend you on developing such a powerful tool to test the security robustness of models. Your efforts in creating this framework are much appreciated.

I have been trying to import a new model/target inside the Counterfit framework and although I have successfully added the model and predicted for an image, I cannot seem to use an attack on them. I have been following the instructions provided in the documentation and on your GitHub, but I have not been able to resolve the issue.

I use the the v1.1.0 version using the docker environment thanks of the Dockerfile in the repo. Here the model I try to import onto the framework :

import numpy as np
from tensorflow.keras.models import load_model

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

class SigmaNet:
    def __init__(self):
        self.name = 'sigmanet'
        self.model_filename = 'sigmanet.h5'
        try:
            self._model = load_model(self.model_filename)
            print('Successfully loaded', self.name)
        except (ImportError, ValueError, OSError):
            print('Failed to load', self.name)

    def color_process(self, imgs):
        if imgs.ndim < 4:
            imgs = np.array([imgs])
        imgs = imgs.astype('float32')
        mean = [125.307, 122.95, 113.865]
        std = [62.9932, 62.0887, 66.7048]
        for img in imgs:
            for i in range(3):
                img[:, :, i] = (img[:, :, i] - mean[i]) / std[i]
        return imgs

    def predict(self, img):
        processed = self.color_process(img)
        return self._model.predict(processed)

    def predict_one(self, img):
        confidence = self.predict(img)[0]
        predicted_class = np.argmax(confidence)
        return class_names[predicted_class]

I created a new CFTarget class to be able to use it into counterfit. Here the code :

import numpy as np
from tensorflow.keras.models import load_model
from counterfit.core.targets import CFTarget
import cv2

class Htbdog(CFTarget):
    data_type = "image"
    target_name = "htbdog"
    endpoint = "/tmp/sigma/sigmanet.h5"
    data_path = "/tmp/sigma/dog.png"
    input_shape = (32, 32,3)
    output_classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    classifier = "closed-box"
    X = []

    def load(self):
        self.model = load_model(self.endpoint)
        img = cv2.imread(self.data_path)
        #self.X.append(img.reshape(3,32,32))
        self.X = np.array([img]).astype('float32') / 255
        print(self.X.shape)

    def predict(self, x):
        #print(x,x.shape)

        confidence = self.model.predict(x[0].reshape(-1,32,32,3)*255)
        predicted_class = np.argmax(confidence)
        #print(confidence, predicted_class, self.output_classes[predicted_class] )
        #return self.output_classes[predicted_class]
        return confidence

So as said before, I have succeed to predict with the image send into the docker environnement; But when i try to do the hop skip jump attack I got this error : image You'll find attached the model used, just remove the .txt sigmanet.h5.txt

I have found it difficult to find additional resources online that could help me troubleshoot the problem. Most of the resources available seem to be outdated and do not provide enough guidance on how to solve this particular issue. I would really appreciate it if you could provide me with some guidance or assistance in resolving this issue. Is there any additional documentation or resources that I can refer to? Also, could you please provide me with some steps or guidelines on how to import a new model/target and use an attack on them?

moohax commented 1 year ago

It's not an error. It is a failure of HSJ. The output means your target is written correctly and does work. The issue is with HSJ not being able to find an initial adversarial image.

Best advice I can give is to check the shapes are as you expect all the way through. Common errors come from incorrect batch sizes and/or channels being C, H, W vs H, W, C. This is especially true if predict works but an attack fails.