Development Corpus Verb Ranking

Wikidata / StrepHit

An intelligent reading agent that understands text and translates it into Wikidata statements.

GNU General Public License v3.0

112 stars 14 forks source link

Closed marfox closed 8 years ago

marfox commented 8 years ago

Build a ranking out of the list of extracted verbs as per #4

marfox commented 8 years ago

rank_verbs.py is responsible for this task. Currently, it performs the following steps:

compute the TF/IDF matrix via TfIdfVectorizer;
compute the cosine similarity score between each verb token and each corpus document, via linear_kernel;
compute the standard deviation of the similarity score list;
output 2 verb lemmas rankings:
1. average similarity score of all token scores;
2. average standard deviation.

marfox commented 8 years ago

Here are the: