dennlinger / TopicalChange

Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.
96 stars 13 forks source link

can this model find cosine similarity between two paragraphs #13

Open desis123 opened 1 year ago

desis123 commented 1 year ago

I was just wondering can this https://huggingface.co/dennlinger/roberta-cls-consec model perform to find cosine / dot similarities between two paragraph of text . Like sentenceBert can perform cosine similarities between two sentences?

dennlinger commented 1 year ago

Hi @desis123, By default, I would say it cannot. Our models were trained with a combined input setting (i.e., two paragraphs fed into the same forward pass, separated by a [SEP] token.
In comparison, late interaction models (or more generally, dual encoders) are not processing two, but one paragraph at a time. Therefore, I would argue that our model is not particularly suited towards producing meaningful embeddings.

Best, Dennis