IBM / fastfit

FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes
Apache License 2.0
183 stars 13 forks source link

Defining new/custom labels at inference time (zero-shot?) #3

Open Taytay opened 7 months ago

Taytay commented 7 months ago

First, this is great! Thank you for publishing the results and code!

This is my favorite part of the paper:

During inference, when provided with a new text, we classify it to the most similar class with respect to a similarity metric S. This method draws inspiration from the way inference is conducted in retrieval systems, eliminating the need for a classification head and aligning the training and inference objectives.

I love that this approach doesn't require a predetermined classification head!

As a result, would I be right to presume that I could provide new labels at inference time? If those labels bore a resemblance to my training set, I think it would do quite well. If they don't, it would "revert" to determining the most similar label, which should still work, right? That makes this a capable zero-shot classifier as well, right?

Here is my initial experiment with it. I appears to "work", although of course its confidence isn't nearly as high if the new labels don't overlap semantically with the original banking labels. I would presume you could fix this by training a more generalized FastFit model?

Am I understanding this correctly?

from typing import List

from fastfit import FastFit
from transformers import AutoTokenizer, pipeline
from transformers.pipelines.base import Pipeline

# Assuming we did the example where we pretrained a model on banking-77 and saved it:

model = FastFit.from_pretrained("fast-fit")
tokenizer = AutoTokenizer.from_pretrained("roberta-large")

classifier: Pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer, device="cuda")

print("\n\nOriginal classifier:")

# Assign inputs variable
inputs = ["I need to pay off my card", "What is my PIN?", "I have a pending top up"]
outputs = classifier(inputs)
# print the inputs and outputs formatted together:
for inp, out in zip(inputs, outputs):
    print(f"Input: {inp}\nOutput: {out}")

def configure_model_with_new_labels(model, new_labels: List[str]):
    # Tokenize the documents
    # ("documents") are what the FastFit model calls the labels
    tokenized_labels = tokenizer(new_labels, padding=True, truncation=True, return_tensors="pt")
    input_ids = tokenized_labels["input_ids"]
    attention_mask = tokenized_labels["attention_mask"]

    # Set the tokenized documents in the model
    model.set_documetns((input_ids, attention_mask))

    # Create and update label mappings
    label_to_id = {label: idx for idx, label in enumerate(new_labels)}
    id_to_label = {idx: label for label, idx in label_to_id.items()}

    # Update model configuration for label mappings
    model.config.label2id = label_to_id
    model.config.id2label = id_to_label
    model.config.num_labels = len(new_labels)

    return model

def test_model_with_labels_and_input(classifier, new_labels, inputs):
    print("\n************")
    print("Configuring with new labels: ", new_labels)
    # Configure the model with new labels
    configure_model_with_new_labels(classifier.model, new_labels)

    # Run the model with the new labels
    outputs = classifier(inputs)

    # Print the inputs and outputs formatted together
    for inp, out in zip(inputs, outputs):
        print(f"Input: {inp}\nOutput: {out}")

test_model_with_labels_and_input(
    classifier,
    # New labels are really close to two of the original labels
    new_labels=["I have a pending card payment", "my pin is blocked"],
    inputs=[
        "I need to pay off my card",
        "What is my PIN?",
        "I have a pending top up",  # this last one is not in the new labels, but is very close to an original label. Let's see if it works too.
    ],
)

# Now some very novel labels:
test_model_with_labels_and_input(classifier, ["positive", "negative"], ["I love you", "I hate it."])

test_model_with_labels_and_input(classifier, ["sports", "politics"], ["Hockey is just the best", "I need to vote", "Vote on the new team captain"])

#prints:

# Original classifier:
# Input: I need to pay off my card
# Output: {'label': 'card payment not recognised', 'score': 0.637493908405304}
# Input: What is my PIN?
# Output: {'label': 'get physical card', 'score': 0.29754528403282166}
# Input: I have a pending top up
# Output: {'label': 'pending top up', 'score': 0.8865770101547241}

# ************
# Configuring with new labels:  ['I have a pending card payment', 'my pin is blocked']
# Input: I need to pay off my card
# Output: {'label': 'I have a pending card payment', 'score': 0.9976436495780945}
# Input: What is my PIN?
# Output: {'label': 'my pin is blocked', 'score': 0.9877238273620605}
# Input: I have a pending top up
# Output: {'label': 'I have a pending card payment', 'score': 0.9031936526298523}

# ************
# Configuring with new labels:  ['positive', 'negative']
# Input: I love you
# Output: {'label': 'positive', 'score': 0.5160248875617981}
# Input: I hate it.
# Output: {'label': 'negative', 'score': 0.5863736271858215}

# ************
# Configuring with new labels:  ['sports', 'politics']
# Input: Hockey is just the best
# Output: {'label': 'sports', 'score': 0.7964810729026794}
# Input: I need to vote
# Output: {'label': 'politics', 'score': 0.6664907336235046}
# Input: Vote on the new team captain
# Output: {'label': 'politics', 'score': 0.7508028149604797}
AsafYehudai commented 7 months ago

Hi @Taytay,

Thanks for your feedback!

It is indeed a cool feature of our method.

I think it can be relevant for cases when new classes need to be introduced only in inference time. It was called out-of-domain (or maybe distribution) OOD in the context of intent detection. So this is a clearer case this can be relevant.

Additionally, yes I agree that our method tech the model to recognize the text and label similarity, like in IR models, and can gain from more general training, and become a zero-shot classifier. Maybe a simple experiment can be to concatenate all FewMany datasets (maybe 5/10-shot) and train and see if it improves its confidence scores.

CC @elronbandel