Wikidata / StrepHit

An intelligent reading agent that understands text and translates it into Wikidata statements.
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
GNU General Public License v3.0
112 stars 14 forks source link

Development Corpus Verb Ranking #5

Closed marfox closed 8 years ago

marfox commented 8 years ago

Build a ranking out of the list of extracted verbs as per #4

marfox commented 8 years ago

rank_verbs.py is responsible for this task. Currently, it performs the following steps:

  1. compute the TF/IDF matrix via TfIdfVectorizer;
  2. compute the cosine similarity score between each verb token and each corpus document, via linear_kernel;
  3. compute the standard deviation of the similarity score list;
  4. output 2 verb lemmas rankings:
    1. average similarity score of all token scores;
    2. average standard deviation.
marfox commented 8 years ago

Here are the: