SforAiDl / decepticonlp

Python Library for Robustness Monitoring and Adversarial Debugging of NLP models
MIT License
15 stars 11 forks source link

Implementing Attacker class #70

Open rajaswa opened 4 years ago

rajaswa commented 4 years ago
abheesht17 commented 4 years ago

Sounds interesting 🤩

rajaswa commented 4 years ago

The ultimate usage can be something like this:

from decepticonlp import attacker
from deceptionlp.transforms import transforms
import torch
from torch.uitls.data import Dataset

#define adversarial transforms
tfms = transforms.Compose(
        [
            transforms.AddChar(),
            transforms.ShuffleChar("RandomWordExtractor", True),
            transforms.VisuallySimilarChar(),
            transforms.TypoChar("RandomWordExtractor", probability=0.5),
        ]
    )

#Original dataset
class IMDB_Dataset(Dataset):
    def __init__(self):
        #some code

    def _len__(self):
        #some code

    def __getitem__(self, idx):
        text = data['review_text'][idx]    #get text from sample
        embeddings = getWordEmbeddings(text)    #convert to sequence of word embeddings
        label = torch.tensor(data['sentiment'][idx])    #sentiment label
        return embeddings, label

#Adversarial dataset
class IMDB_Adversarial_Dataset(Dataset):
    def __init__(self):
        #some code

    def _len__(self):
        #some code

    def __getitem__(self, idx):
        text = data['review_text'][idx]    #get text from sample
        adversarial_text = tfms(text)    #apply adversarial transform
        embeddings = getWordEmbeddings(adversarial_text)    #convert to sequence of word embeddings
        label = torch.tensor(data['sentiment'][idx])    #sentiment label
        return embeddings, label

#Load pre-trained model
imdb_classifier = torch.load("IMDB_Classifier.pth")
imdb_classifier.eval()

#Set up the attacker
IMDB_attacker = attacker()
IMDB_attacker.model = imdb_classifier
IMDB_attacker.dataset = IMDB_Dataset()
IMDB_attacker.adversarial_dataset = IMDB_Adversarial_Dataset()
IMDB_attacker.criterion = ['accuracy', 'F1', 'BCELoss']

#Attack and get logs
IMDB_attacker.attack()
IMDB_attacker.get_crtierion_logs()
IMDB_attacker.show_best_attacks()
IMDB_attacker.show_worst_attacks()

#Maybe more functionalities?
rajaswa commented 4 years ago

This shall be a multi-step process, avoid limiting yourself to the above mentioned example functionalities and please feel free to discuss more functionalities.

Sharad24 commented 4 years ago

Looks pretty good!

Can think of integration with this https://github.com/huggingface/nlp https://github.com/huggingface/nlp as well.

On 21-May-2020, at 5:17 PM, Rajaswa Patil notifications@github.com wrote:

The ultimate usage can be something like this:

from decepticonlp import attacker from deceptionlp.transforms import transforms import torch from torch.uitls.data import Dataset

define adversarial transforms

tfms = transforms.Compose( [ transforms.AddChar(), transforms.ShuffleChar("RandomWordExtractor", True), transforms.VisuallySimilarChar(), transforms.TypoChar("RandomWordExtractor", probability=0.5), ] )

Original dataset

class IMDB_Dataset(Dataset): def init(self):

some code

def _len__(self):
    #some code

def __getitem__(self, idx):
    text = data['review_text'][idx]    #get text from sample
    embeddings = getWordEmbeddings(text)    #convert to sequence of word embeddings
    label = torch.tensor(data['sentiment'][idx])    #sentiment label
    return embeddings, label

Adversarial dataset

class IMDB_Adversarial_Dataset(Dataset): def init(self):

some code

def _len__(self):
    #some code

def __getitem__(self, idx):
    text = data['review_text'][idx]    #get text from sample
    adversarial_text = tfms(text)    #apply adversarial transform
    embeddings = getWordEmbeddings(adversarial_text)    #convert to sequence of word embeddings
    label = torch.tensor(data['sentiment'][idx])    #sentiment label
    return embeddings, label

Load pre-trained model

imdb_classifier = torch.load("IMDB_Classifier.pth") imdb_classifier.eval()

Set up the attacker

IMDB_attacker = attacker() IMDB_attacker.model = imdb_classifier IMDB_attacker.dataset = IMDB_Dataset() IMDB_attacker.adversarial_dataset = IMDB_Adversarial_Dataset() IMDB_attacker.criterion = ['accuracy', 'F1', 'BCELoss']

Attack and get logs

IMDB_attacker.attack() IMDB_attacker.get_crtierion_logs() IMDB_attacker.show_best_attacks() IMDB_attacker.show_worst_attacks()

Maybe more functionalities?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SforAiDl/decepticonlp/issues/70#issuecomment-632042171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH72FJ4QABLTN23HJ6NFQTDRSUIDZANCNFSM4NGYMAWA.

abheesht17 commented 4 years ago

Maybe we can add functionality to draw graphs as well for loss, accuracy, etc. for both the original dataset and the adversarial dataset. More of a utility function rather than a necessary one though.

Sharad24 commented 4 years ago

Would be helpful although if we’re doing that then a logger (Tensorboard, etc) would be much better.

On 21-May-2020, at 5:31 PM, abheesht17 notifications@github.com wrote:

Maybe we can add functionality to draw graphs as well for loss, accuracy, etc. for both the original dataset and the adversarial dataset. More of a utility function rather than a necessary one though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SforAiDl/decepticonlp/issues/70#issuecomment-632048285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH72FJYG6I7JCHFJZJZVKFLRSUJ2VANCNFSM4NGYMAWA.

rajaswa commented 4 years ago

Which one seems the better option:

  1. Implement everything in crude way as shown in example above and then added logger, huggingface/nlp and other enhancements slowly over time
  2. Start implementation with all these things taken into consideration beforehand
Sharad24 commented 4 years ago

To me, definitely point 1

Sharad24 commented 4 years ago

You can make a checklist here maybe with the suggestions here to keep track of the enhancements.

parantak commented 4 years ago

I am not sure if this would be feasible but introducing an option for the different available embeddings could be done. The attacker should have an option, if possible, depending on his model.

abheesht17 commented 4 years ago

The user will define it himself in his Dataset class; we don't have to bother about which embedding he has used.

abheesht17 commented 4 years ago

What should I check in the test function of the Attacker class?

Sharad24 commented 4 years ago

Tests for attacker class would be 'integration' type tests rather than unit tests. You'd want to have common example cases where the attacker does what it is supposed to do.

On Fri, 22 May 2020, 07:42 abheesht17, notifications@github.com wrote:

What should I check in the test function of the Attacker class?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SforAiDl/decepticonlp/issues/70#issuecomment-632439321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH72FJZYVZMOX2UDCLWHX4DRSXNRRANCNFSM4NGYMAWA .

rajaswa commented 4 years ago

@Sharad24 can you provide some reference examples for these 'integration' type tests? Maybe from GenRL?

Sharad24 commented 4 years ago

Hmm the integration tests in GenRL are not that good and sort of brittle as of yet. Try reading from here: https://www.fullstackpython.com/integration-testing.html

The goal of them is that the individual units that are going to be used inside the attacker in our case the different transforms, etc., finally are able to work together through one API as they were intended. You don't really have to check the output from each individual unit for each different case as that is already done in their unit testing. Here, we only test how good the objects work together and if they are any brittle points in their interfacing.

Although, if there are some methods in the Attacker class that are working as individual units then there should be unit tests for them.

abheesht17 commented 4 years ago

Thanks! Will do