MLEVN / mlevn.github.io

ML EVN - Yerevan machine learning community
https://mlevn.org
9 stars 19 forks source link

add analogies dataset and new embeddings #49

Closed tsolakghukasyan closed 4 years ago

tsolakghukasyan commented 5 years ago

Adding a link to an adaptation of Google's word analogy task (15646 analogy questions) and pre-trained word embeddings:

--200-dimensional GloVe\ --300-dimensional CBOW and SkipGram\ --200-dimensional fastText (.text, .bin).

The training data (90.5mln tokens) included:

--Wikipedia\ --fiction texts taken from the open part of the EANC corpus\ --HC Corpora containing blogs and news articles collected by Hans Christensen in 2011\ --digitized and reviewed part of Armenian soviet encyclopedia from Wikisource\ --news articles.