Open GeraldFZ opened 1 month ago
Hello!
It actually sounds a bit like you have 1 label, which either has a value of 0 or 1? Is that right? If so, SoftmaxLoss with num_labels=
1` is an option I believe. I'm not sure whether BinaryCrossEntropy is a better loss function to use with SoftmaxLoss, but I assume that it's possible.
That said, SoftmaxLoss might not be the strongest option. You can consider converting your dataset to one that's compatible with e.g. CosineSimilarityLoss or MultipleNegativesRankingLoss as shown here: https://sbert.net/docs/sentence_transformer/loss_overview.html
Additionally, you might be interested in the SetFit project: https://github.com/huggingface/setfit/
Hello!
It actually sounds a bit like you have 1 label, which either has a value of 0 or 1? Is that right? If so, SoftmaxLoss with
num_labels=
1` is an option I believe. I'm not sure whether BinaryCrossEntropy is a better loss function to use with SoftmaxLoss, but I assume that it's possible.That said, SoftmaxLoss might not be the strongest option. You can consider converting your dataset to one that's compatible with e.g. CosineSimilarityLoss or MultipleNegativesRankingLoss as shown here: https://sbert.net/docs/sentence_transformer/loss_overview.html
Additionally, you might be interested in the SetFit project: https://github.com/huggingface/setfit/
- Tom Aarsen
Thanks a lot for your answer!! I was trying using CosineSimilarityLoss with BCE for this Binary classification task, and considering your answer to my other question: https://github.com/UKPLab/sentence-transformers/issues/2753
Thanks and all the best Zhe
may I ask your opinion should I use sigmoid or (cos_sim + 1) / 2 or any other way to do this?
I think (cos_sim + 1) / 2
makes more sense than (default) sigmoid, as sigmoid will only use 0.3 <-> 0.7 or so when fed with values between -1 and 1.
That said, the cosine similarity is rarely negative, and perhaps you should push "unrelated" to a cosine similarity of 0 rather than -1, e.g. like in #2753. With other words, perhaps the best solution is relu(cos_sim)
to just replace all negatives with 0?
may I ask your opinion should I use sigmoid or (cos_sim + 1) / 2 or any other way to do this?
I think
(cos_sim + 1) / 2
makes more sense than (default) sigmoid, as sigmoid will only use 0.3 <-> 0.7 or so when fed with values between -1 and 1. That said, the cosine similarity is rarely negative, and perhaps you should push "unrelated" to a cosine similarity of 0 rather than -1, e.g. like in #2753. With other words, perhaps the best solution isrelu(cos_sim)
to just replace all negatives with 0?
- Tom Aarsen
Thanks for your reply and they are very inspiring! So, as I understood, according to our discussion:
for this binary classification task using BCE, since I need to map the cos to (0,1), using (cos_sim + 1) / 2 or relu(cos_sim) to transform cos sim should be ok, or let's say at least sounds more feasible than sigmoid.
for my nonbinary prediction with labels value from 0-1, according to your experiments, using raw cos sim (-1, 1 ) should be ok, and actually as you said in practice: using ranges of 0 to 1 simply seems to result in higher scores. even so, in this task, specifying the label and cos value to the same range (a. mapping cos to (0,1) or b. mapping my labels to (-1,1)) will somehow deserve a try and see the experiments.
May I know did I understood it correctly?
Thanks for your patience!
Best Zhe
Hi, thanks for sharing your work and they are great!
Here I have a binary classification task with labels 0 and 1 by embeddings of two sentences, may I ask if should I use SoftmaxLoss for this binary task by just setting the parameter _num_labels_ as 2, I am hesitating because it seems to use CrossEntropyLoss for the loss function but not binary cross entropy... if not may I know what would you recommend to use for a loss function? @tomaarsen @nreimers Thanks a lot! :)