dssg / givinggraph

An API tool to help understand the relationships between non-profits, for-profits, and the causes they support.
https://github.com/dssg/givinggraph/wiki/API
MIT License
28 stars 13 forks source link

Benchmark gensim similarity calculations vs. custom calculations #5

Closed JohnHBrock closed 10 years ago

JohnHBrock commented 11 years ago

gensim may be slower than what we can do manually, perhaps with some help from nltk. Using gensim, we recalculate pairings of texts, but since cosine similarity commutes, we only need to calculate one triangle of the resulting matrix of results.

JohnHBrock commented 10 years ago

Giorgio and I compared the results: the similarity scores produced by the old custom calculations and the new gensim calculations are very close, but the gensim code is faster.