UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.78k stars 2.43k forks source link

[Bug] Non-terminating behavior for text-summarization example #1425

Open dennlinger opened 2 years ago

dennlinger commented 2 years ago

I've encountered a bug with the summarization example, where the underlying Lexrank function would not converge, and iterate forever.

The root cause of this behavior seems to be a problem of the assumption of Lexrank; any (similarity) matrix supplied must follow the simple principle of representing a "stochastic, irreducible and non-periodic [Markov chain] matrix." (see the paper, Algorithm 2).

While this holds true for similarity matrixes generated with graph degrees (as in the original paper), the "transition probabilities" in the case of sentence-transformers correspond to the cosine similarity -- which can be negative. This negative value violates the reducibility and periodicity criteria, which can cause the iterative power method to diverge (which is used to compute the dominant eigenvaluem, eventually forming the centrality distribution).

I'm proposing a simple fix by normalizing slightly differently soon! Best, Dennis

dennlinger commented 2 years ago

For reference, I've also discussed this upstream in the Lexrank repository