dkpro / dkpro-similarity

Word and text similarity measures
https://dkpro.github.io/dkpro-similarity
Other
53 stars 22 forks source link

Effective similarity matrix computation #47

Open iokuznetsov opened 8 years ago

iokuznetsov commented 8 years ago

It would be nice to have a method that given two lists of terms N and M returns an N*M similarity matrix for those terms. First, this representation seems to be generic and has many use cases. Second, batch similarity computation can be optimized to achieve better-than-naive performance and minimize reads. In the case of VSMs it can be even implemented as a matrix operation instead of pairwise vector similarities. The mtj library used in dkpro-similarity is powered by BLAS, so it should be possible to perform basic linear algebra operations really fast.