FinalsClub / djKarma

GNU General Public License v3.0
9 stars 1 forks source link

[Feature] Gensim similarity text analysis #213

Open sethwoodworth opened 11 years ago

sethwoodworth commented 11 years ago

This is a big feature, and is listed here as a placeholder for the conversation of when or if to add it.


Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible. ... Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

http://radimrehurek.com/gensim/intro.html

There is also a pre-packaged server implementation of the library, that looks like it would be ideal as a dedicated processing server for document's similarity.

https://github.com/piskvorky/gensim-simserver

It uses an extreme free software license, the AGPL

This means you may use simserver freely in your application (even commercial application!), but you must then open-source your application as well, under an AGPL-compatible license.

But luckily for us, our license is totally compatible with theirs.