Text2Topic : a new loss function ?

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.43k stars 2.5k forks source link

Text2Topic : a new loss function ? #2605

Open azaismarc opened 7 months ago

azaismarc commented 7 months ago

Paper : https://aclanthology.org/2023.emnlp-industry.10.pdf

Recently, Booking.com shared a new architecture called Text2Topic. This architecture takes as input a text and a topic with its description, and outputs a score between 0 and 1.

To achieve this, the architecture implements a new loss function, which resembles the SoftMax loss but for binary classification.

Based on my personal experiences, the loss function also improves the embeddings of a pre-trained model like "domain adaptation".

I would like to hear opinions on the relevance of their approach and whether an implementation like the loss function would be feasible.

Thanks

ir2718 commented 7 months ago

Hi,

correct me if I'm wrong but I think this is already implemented in the SoftmaxLoss class, you just have to set all the concatenation_sent parameters to true:

class SoftmaxLoss(nn.Module):
    def __init__(
        self,
        model: SentenceTransformer,
        sentence_embedding_dimension: int,
        num_labels: int,                                 # set to 2 if you want binary classification
        concatenation_sent_rep: bool = True,             # set to True if you want u and v
        concatenation_sent_difference: bool = True,      # set to True if you want abs(u-v)
        concatenation_sent_multiplication: bool = False, # set to True if you want (u*v)
        loss_fct: Callable = nn.CrossEntropyLoss(),
    ):

Implementation aside, the paper reminds me a lot of approaches from SphereFace2, and exploiting hyperspherical embeddings for OOD, which are pretty cool.

azaismarc commented 7 months ago

Yeah, the loss is very similar to SoftmaxLoss but they add a FFN with Relu activation before the classifier.

ir2718 commented 7 months ago

Oh I see. The simplest way to do this is a bit hacky. You can just use the SoftmaxLoss and add then change the classifier variable. Something along these lines (haven't tested this though):

model = SentenceTransformer("bert-base-cased")
loss = SoftmaxLoss(
    model=model,
    sentence_embedding_dimension=model.get_sentence_embedding_dimension(),
    num_labels=1,
    concatenation_sent_rep=True,
    concatenation_sent_difference=True,
    concatenation_sent_multiplication=True,
    loss_fct=nn.BCEWithLogitsLoss(),
)
loss.classifier = nn.Sequential(
    nn.Linear(
        in_features=loss.classifier.in_features,
        out_features=model.get_sentence_embedding_dimension(),
        bias=True
    ),
    nn.ReLU(),
    nn.Dropout(p=0.3), # not sure what value is used in the paper
    nn.Linear(
        in_features=model.get_sentence_embedding_dimension(),
        out_features=1,
        bias=True
    ),
)
loss.classifier.to(model.device)

azaismarc commented 7 months ago

Thanks, I implemented the same way !

However, I found interesting the idea to compute a similarity matching towards classifieur like Cross-Encodeur but with Bi-Encodeur for cached some embeddings. Imo, the classifieur allows to catch more information that cosine similarity.

Maybe a new features for SetFit, or Sentence Transformers ?

ir2718 commented 7 months ago

Since it's such a small change to the SoftmaxLoss, maybe it's better to just expose classifier as a parameter? I can create a PR if needed

cc @tomaarsen