UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.15k stars 2.46k forks source link

Q: How score as label is getting used? #1213

Open deepankar27 opened 3 years ago

deepankar27 commented 3 years ago

Hello Team,

I have one small confusion, in both cross-encoder & bi-encoder you are taking semantic score as a labels but my confusion is how it's getting mapped or used during training process. Can you please throw some light on this?

InputExample(texts=['sentence1', 'sentence2'], label=0.3),

nreimers commented 3 years ago

Bi-Encoder process the two inputs independently, produces embeddings, and then computes the cosine similarity.

Cross-Encoder concatenates the inputs, passes it through the transformer network, takes the CLS token output and performs a down-projection to 1 dimension which is the output

deepankar27 commented 3 years ago

@nreimers Thank you for prompt reply. So, the semantic score as label is getting used for validation only?

nreimers commented 3 years ago

Sometimes for validation, sometimes also for the training. Depends on the specific script

deepankar27 commented 3 years ago

All right..!!! It would be great if you can tell for which script we would use it & I will take it up from there. My intent was to get it clarified that are we using the scores for fine-tuning any models or not, that's all.