cdpierse / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
1.3k stars 97 forks source link

ZeroShotClassificationExplainer does not correctly explain ZeroShotClassificationPipeline results (single label) #84

Open ArneBinder opened 2 years ago

ArneBinder commented 2 years ago

In the case of a single label, the logic to calculate the classification probability with the ZeroShotClassificationExplainer (see here) is different than the logic in the Huggingface ZeroShotClassificationPipeline (see here):

At least, if this is intended, it should be documented somewhere. My usecase is multi-label classification and I used the single label approach to simulate that, but it took me some time to figure out that this does not work to explain ZeroShotClassificationPipeline predictions.

cdpierse commented 2 years ago

Hi @ArneBinder ,

With the line example you linked for Transformers Interpret it is an edge case to accommodate models where there is only a single output node, it's a carryover from the sequence classifier and in all likelihood would never be used for ZeroShot due to the reliance on NLI models.

It's also worth pointing out that the zero-shot explainer is a subclass of both the SequenceClassificationExplainer and the QuestionAnsweringExplainer, rather esoterically I use the method _get_preds() from the QA explainer and then the zero-shot explainers' own _forward() method which is a softmax w.r.t. the entailment class only.

The reason I don't include the contradiction score here is that given the way we use the zero-shot explainer the contradiction scores are not relevant. What we want to know is which class label made the NLI model fire the most w.r.t entailment. At least that's how I interpret it, I might be missing something though. How would you think that the contradiction scores could be used for a zero-shot pipeline ?

Also, if you are trying to explain a multi-label system I would suggest using our new MultiLabelClassificationExplainer.

ArneBinder commented 2 years ago

Thanks for the quick response (and also this very cool project btw)!

How would you think that the contradiction scores could be used for a zero-shot pipeline ?

As I sad, the contradiction scores are used to normalize the entailment scores in the transformers.ZeroShotClassificationPipeline when classes are independent (multi-label), see this code.

What exactly can I do in this case? For now, I have the following code to get my predictions:

sequences = pd.Series(["text1", "text2"])
classes = pd.Series(["label1", "label2", "label3"])
hypothesis_template: str = 'This example is {}.'

model: ZeroShotClassificationPipeline = transformers.pipeline("zero-shot-classification")

# call the pipeline with multi_label=True
model_output = model(
    sequences=sequences.to_list(), candidate_labels=classes.to_list(), hypothesis_template=hypothesis_template, multi_label=True
)
# convert to dataframe. note: ld_to_dl converts a list of dicts to a dict of lists
res_df = pd.DataFrame(ld_to_dl(model_output))
# use "labels" returned by model to rearrange result because order of scores is not fixed
classes_to_index = pd.Series(data=classes.index, index=classes.values)
predictions = res_df.apply(lambda row: pd.Series(row["scores"], index=classes_to_index[row["labels"]], dtype=float), axis=1)
predictions.index = sequences.index

# result is a dataframe with classes.index as columns, sequences.index as index and scores as entries 

Is there an easy way to apply transfomers-interpret?

EDIT: I just re-implemented the Huggingface transformers.ZeroShotClassificationPipeline logic for my purpose to use a transformers.ZeroShotClassificationPipeline instead (this pipeline just calls the base model and applies softmax over the classes, see here), maybe this is a good starting point:

model_name = "facebook/bart-large-mnli"
pipeline: TextClassificationPipeline = transformers.pipeline("text-classification", model=model_name, tokenizer=model_name, return_all_scores=True)

input_texts = assemble_nli_input_texts(
    sequences=sequences.to_list(), labels=classes.to_list(), hypothesis_template=hypothesis_template,
    tokenizer=pipeline.tokenizer
)
# code below is equivalent to:
# pipeline_output = pipeline(input_texts, add_special_tokens=False)
# scores_dict = ld_to_dl([{x["label"]: x["score"] for x in seq_res} for seq_res in pipeline_output])
# named_scores = {k: np.array(v) for k, v in scores_dict.items()}
model_inputs = pipeline._parse_and_tokenize(input_texts, add_special_tokens=False)
with torch.no_grad():
    model_output = pipeline.model(**model_inputs)[0].cpu()
model_output_softmax = np.exp(model_output) / np.exp(model_output).sum(-1, keepdims=True)
named_scores = {label: model_output_softmax[:, idx].numpy() for label, idx in pipeline.model.config.label2id.items()}

scores_entailment_normalized = named_scores["entailment"] / (
        named_scores["entailment"] + named_scores["contradiction"])
scores_reshaped = scores_entailment_normalized.reshape((len(sequences), len(classes)))
predictions = pd.DataFrame(scores_reshaped, index=sequences.index, columns=classes.index)

with

def assemble_nli_input_texts(
        sequences: List[str], labels: List[str], hypothesis_template: str, tokenizer: PreTrainedTokenizer,
):
    args_parser = ZeroShotClassificationArgumentHandler()
    sequence_pairs = args_parser(sequences=sequences, labels=labels, hypothesis_template=hypothesis_template)
    encodings = tokenizer(
        sequence_pairs,
        add_special_tokens=True,
        return_tensors=None,
        padding=False,
        truncation=False,
    )

How can I make use of MultiLabelClassificationExplainer to get explanations for the normalized output?