UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.45k stars 2.5k forks source link

Can we input more than one labels to `InputExample` #2264

Open dafajon opened 1 year ago

dafajon commented 1 year ago

Example Usage For multilabel multiclass classification or multi-target regression settings, can we instantiate an example as such:

InputExample(["sentence1", "sentence2"], label=[cat1_index, cat5_index, cat6_index])
dafajon commented 1 year ago

I tried and succeeded, yet there is a catch.

If you want to initialize an InputExample instance with a list of targets, no error is raised. In my case, the problem setting is a multi-target regression for CommonLit Competition in Kaggle.

InputExample(["Write a summary for this text", "This is the summary"], label=[0.3, 0.24])

Even if the initialization succeeds, there is a problem during training where the regression loss does not expect a tensor of type torch.long. A custom loss would work; but conversion to long on line 82

labels = torch.tensor(labels, dtype=torch.float if self.config.num_labels == 1 else torch.long).to(self._target_device)

causes a loss in the decimal points and casts targets into integers.

This cannot be avoided because fit call at line 103 automatically sets the collate function where line 82 is at.

I suggest a workaround either by setting a problem type, or overriding with a custom collate function that can process input example. We can already provide a collator to DataLoader object but it gets overriden by line 103. So an if not Nonecheck would suffice.