cross-encoder/quora-distilroberta-base has false positive for short word

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

Apache License 2.0

14.84k stars 2.44k forks source link

Hello!

Dense NLP models remain black boxes that can make mistakes, especially when the inputs differ a lot from the inputs encountered during training. https://huggingface.co/cross-encoder/quora-distilroberta-base was trained on Quora questions, so it's not too surprising that it doesn't do well with:

"FAQ"

Perhaps you'll find better luck with some other cross-encoder, but there'll likely always be outliers where the model gives odd results. I'll close this for now, as I don't think we can fix this issue outright.

Tom Aarsen

UKPLab / sentence-transformers

cross-encoder/quora-distilroberta-base has false positive for short word #2919