Black-box attacks not truly black-box

ap-buf commented 2 months ago

I spent some time looking into utilizing the black-box attacks available in HEART/ART but they seem to still require a model.

For example, the Query Efficient Black Box Attack (https://github.com/IBM/heart-library/blob/main/src/heart_library/attacks/evasion/query_efficient_bb_attack.py) use the ART Base Class Estimator (https://adversarial-robustness-toolbox.readthedocs.io/en/latest/modules/estimators.html#base-class-estimator). The first parameter for this is the model. I do not have access to the full model as I have a true black box set up with models running separately accessible via API.

Can these be implemented without needed to provide a model?

beat-buesser commented 2 months ago

Hi @ap-buf,

Thank you very much for your interest in HEART/ART! That's a great question and ART already provides such functionality. Your evaluation can become completely black-box if you use BlackBoxClassifier from art.estimators.classification. This estimator only requires a python function that returns a classification vector. This function can include anything that can provide a classification from remote access to an API, random or human decision, a complex non-ML classification model, etc. ART provides 3 notebooks demonstrating BlackBoxClassifier in different scenarios:

attacking a model deployed through a API: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/classifier_blackbox.ipynb
attacking precomputed classification predictions in a look up table: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/classifier_blackbox_lookup_table.ipynb
attacking a complex non-ML model on the example of Tesseract for OCR: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/classifier_blackbox_tesseract.ipynb

ap-buf commented 1 month ago

Thanks for the reply @beat-buesser, I did see that but realized after diving in that it can only be used for classification (1 label per image) but what about for object detection (n objects per image)? According to the paper DPatch is for black-box object detection but the DPatch attack has estimator: OBJECT_DETECTOR_TYPE which all require the model as far as I can tell. Are there any options for attacks for black-box object detection?

beat-buesser commented 1 month ago

Hi @ap-buf The attack described in the DPatch paper (https://arxiv.org/pdf/1806.02299) is white-box for the step of creating the adversarial patch, despite the paper mentioning black-box in abstract and introduction. The creation of the patch is white-box because it requires loss-gradients for the optimisation of the patch. Their reference to black-box only refers to the step of applying/inserting the already generated patch in images that are provided as input to an object detection model (which in the black-box scenario is a different object detection model than the one used in the white-box generation step) and the tools of ART are not needed for this insertion step.

I would recommend to use the robust DPatch algorithm of Lee & Kolter (https://arxiv.org/pdf/1906.11897) which fixes bugs in the original DPatch paper. I also would use ART's AdversarialPatchPyTorch to run the robust algorithm to take advantage of a PyTorch-native implementation for improved performance.

IBM / heart-library

Black-box attacks not truly black-box #2