Training of the first sentiment model - Githubissues

Svetuf / CSC_Sentiment_Russian_NER

0 stars 0 forks source link

Training of the first sentiment model #1

Closed roddar92 closed 3 years ago

roddar92 commented 3 years ago

Split of sample on train and test:

shuffle
80% of documents - train, 20% - test

Algorithm:

creating of word2vec
training model: one of the GBM (i.e. LightGBM) / SVM / ...

Extension of word2vec:

Train TF-IDF model (bag-of-words) on the normalized corpus
TF = frequency of a word in document, IDF = log(N/#{count of documents with a word})
word2vec.get_vector(word) * idf(word)

Document = Tweet