Open aliosia opened 3 years ago
Hi @aliosia You usually get much better results, if you use directly Transformers and fine-tune it on your sentiment classification task.
I don't know who brought this idea up in the community, but it was never a good idea to first map a sentence to an embedding and then using this embedding as (only) feature for a classifier like logistic regression. Classifier working directly on the text data always outperformed these sentence embedding -> classifier constructions.
So for your case I can recommend to fine tune directly for classification and to not use a sentence embedding in between.
Thanks a lot for your explanation @nreimers I will surely test the other way more, but in my first try, I got better results with SBERT features.
Also the idea of first training with Siamese networks (contrastive loss or triplet loss), in an unsupervised way, and then fine-tuning with the logistic loss for classification is not new, and I remember that near for two years (near 2015) the state of the art face classification model used both loss functions together. Hence, I think starting from a pre-trained network and fine-tuning with a classification loss seems reasonable.
Hi @aliosia, any luck with trying with SBERT features fine-tuned for classification so far? Thanks!
Hi @nreimers , how would you use S-Bert for a multiclass classification task where the documents to be classified each contain many sentences (say, 30)? Is there an example of how this would be done?
@davidmosca The CrossEncoder can be used for this. Have a look at the examples in this git
Hi @nreimers I have found this example but it only works for pairs of sentences. Is it possible to modify it to classify a full set of sentences? Thanks.
Just concat the sentences as a single text
Hi @nreimers, is there a maximum number of words that I might exceed if I concatenate all sentences? If so, is it possible to change this parameter, or to go for an alternative solution (that preserves all sentences)?
Thanks a lot for your explanation @nreimers I will surely test the other way more, but in my first try, I got better results with SBERT features.
Also the idea of first training with Siamese networks (contrastive loss or triplet loss), in an unsupervised way, and then fine-tuning with the logistic loss for classification is not new, and I remember that near for two years (near 2015) the state of the art face classification model used both loss functions together. Hence, I think starting from a pre-trained network and fine-tuning with a classification loss seems reasonable.
It seems LM pretrained on NLI/paraphrase data gives better embeddings for downstream tasks directly.
Hi, Thanks a lot for the great SBERT, I wanted to add a softmax layer on top of one of the pre-trained models and build a classifier, but I saw this and thought maybe there is no option in updating the weight of pre-trained model; Is this true?
If not, I wrote a customized Dataset class and called model.tokeinze() in that, just like SentenceDataset. But when I built a dataset and pass it to a DataLoader I got the following error:
RuntimeError: stack expects each tensor to be equal size, but got [295] at entry 0 and [954] at entry 1
I wonder if I should call prepare_for_model after calling tokenize method or what?Thanks in advance.