ZeroShotClassificationExplainer does not correctly explain ZeroShotClassificationPipeline results (single label)

ArneBinder commented 2 years ago

In the case of a single label, the logic to calculate the classification probability with the ZeroShotClassificationExplainer (see here) is different than the logic in the Huggingface ZeroShotClassificationPipeline (see here):

The Huggingface ZeroShotClassificationPipeline calculates the softmax over entailment and contradiction scores and returns the resulting value for entailment, but
the ZeroShotClassificationExplainer returns just the sigmoid of the entailment score.

At least, if this is intended, it should be documented somewhere. My usecase is multi-label classification and I used the single label approach to simulate that, but it took me some time to figure out that this does not work to explain ZeroShotClassificationPipeline predictions.

cdpierse commented 2 years ago

Hi @ArneBinder ,

With the line example you linked for Transformers Interpret it is an edge case to accommodate models where there is only a single output node, it's a carryover from the sequence classifier and in all likelihood would never be used for ZeroShot due to the reliance on NLI models.

It's also worth pointing out that the zero-shot explainer is a subclass of both the SequenceClassificationExplainer and the QuestionAnsweringExplainer, rather esoterically I use the method _get_preds() from the QA explainer and then the zero-shot explainers' own _forward() method which is a softmax w.r.t. the entailment class only.

The reason I don't include the contradiction score here is that given the way we use the zero-shot explainer the contradiction scores are not relevant. What we want to know is which class label made the NLI model fire the most w.r.t entailment. At least that's how I interpret it, I might be missing something though. How would you think that the contradiction scores could be used for a zero-shot pipeline ?

Also, if you are trying to explain a multi-label system I would suggest using our new MultiLabelClassificationExplainer.

ArneBinder commented 2 years ago

Thanks for the quick response (and also this very cool project btw)!

How would you think that the contradiction scores could be used for a zero-shot pipeline ?

As I sad, the contradiction scores are used to normalize the entailment scores in the transformers.ZeroShotClassificationPipeline when classes are independent (multi-label), see this code.

What exactly can I do in this case? For now, I have the following code to get my predictions:

sequences = pd.Series(["text1", "text2"])
classes = pd.Series(["label1", "label2", "label3"])
hypothesis_template: str = 'This example is {}.'

model: ZeroShotClassificationPipeline = transformers.pipeline("zero-shot-classification")

# call the pipeline with multi_label=True
model_output = model(
    sequences=sequences.to_list(), candidate_labels=classes.to_list(), hypothesis_template=hypothesis_template, multi_label=True
)
# convert to dataframe. note: ld_to_dl converts a list of dicts to a dict of lists
res_df = pd.DataFrame(ld_to_dl(model_output))
# use "labels" returned by model to rearrange result because order of scores is not fixed
classes_to_index = pd.Series(data=classes.index, index=classes.values)
predictions = res_df.apply(lambda row: pd.Series(row["scores"], index=classes_to_index[row["labels"]], dtype=float), axis=1)
predictions.index = sequences.index

# result is a dataframe with classes.index as columns, sequences.index as index and scores as entries

Is there an easy way to apply transfomers-interpret?

EDIT: I just re-implemented the Huggingface transformers.ZeroShotClassificationPipeline logic for my purpose to use a transformers.ZeroShotClassificationPipeline instead (this pipeline just calls the base model and applies softmax over the classes, see here), maybe this is a good starting point:

model_name = "facebook/bart-large-mnli"
pipeline: TextClassificationPipeline = transformers.pipeline("text-classification", model=model_name, tokenizer=model_name, return_all_scores=True)

input_texts = assemble_nli_input_texts(
    sequences=sequences.to_list(), labels=classes.to_list(), hypothesis_template=hypothesis_template,
    tokenizer=pipeline.tokenizer
)
# code below is equivalent to:
# pipeline_output = pipeline(input_texts, add_special_tokens=False)
# scores_dict = ld_to_dl([{x["label"]: x["score"] for x in seq_res} for seq_res in pipeline_output])
# named_scores = {k: np.array(v) for k, v in scores_dict.items()}
model_inputs = pipeline._parse_and_tokenize(input_texts, add_special_tokens=False)
with torch.no_grad():
    model_output = pipeline.model(**model_inputs)[0].cpu()
model_output_softmax = np.exp(model_output) / np.exp(model_output).sum(-1, keepdims=True)
named_scores = {label: model_output_softmax[:, idx].numpy() for label, idx in pipeline.model.config.label2id.items()}

scores_entailment_normalized = named_scores["entailment"] / (
        named_scores["entailment"] + named_scores["contradiction"])
scores_reshaped = scores_entailment_normalized.reshape((len(sequences), len(classes)))
predictions = pd.DataFrame(scores_reshaped, index=sequences.index, columns=classes.index)

with

def assemble_nli_input_texts(
        sequences: List[str], labels: List[str], hypothesis_template: str, tokenizer: PreTrainedTokenizer,
):
    args_parser = ZeroShotClassificationArgumentHandler()
    sequence_pairs = args_parser(sequences=sequences, labels=labels, hypothesis_template=hypothesis_template)
    encodings = tokenizer(
        sequence_pairs,
        add_special_tokens=True,
        return_tensors=None,
        padding=False,
        truncation=False,
    )

How can I make use of MultiLabelClassificationExplainer to get explanations for the normalized output?

cdpierse / transformers-interpret

ZeroShotClassificationExplainer does not correctly explain ZeroShotClassificationPipeline results (single label) #84