UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.35k stars 2.49k forks source link

Deduplication integrated in CachedGISTEmbedLoss #3063

Open yjoonjang opened 1 day ago

yjoonjang commented 1 day ago

Hello @tomaarsen , I'm a student who loves using sentence-transformers library.

While looking at the codes, I thought that deduplication could be integrated in CachedGISTEmbedLoss (or just GISTEmbedLoss) by revising the code from

ap_sim[guided_ap_sim > guided_sim] = -torch.inf
aa_sim[guided_aa_sim > guided_sim] = -torch.inf
pp_sim[guided_pp_sim > guided_sim] = -torch.inf

to

ap_sim[guided_ap_sim >= guided_sim] = -torch.inf
aa_sim[guided_aa_sim >= guided_sim] = -torch.inf
pp_sim[guided_pp_sim >= guided_sim] = -torch.inf

Just adding the equal sign. How do you think about this?

tomaarsen commented 1 day ago

cc @JINO-ROHIT

JINO-ROHIT commented 21 hours ago

Hi @yjoonjang @tomaarsen we are experimenting with the same in the issue #2756 , do have a look at my comments