WladimirSidorenko / SentiLex

Sentiment Lexicon Generation Suite
MIT License
15 stars 4 forks source link

Paper on SentiLex? #1

Closed RacheleSprugnoli closed 4 years ago

RacheleSprugnoli commented 5 years ago

Dear Wladimir, is there a paper describing the SentiLex project? Do you have suggestions about the best parameter setting of vec2dic? Thanks in advance. Best, Rachele

WladimirSidorenko commented 5 years ago

Hello Rachele,

Glad that you were interested in the project.

Yes, there is a paper (mainly on the re-implementations of existing algorithms). It was presented at the PEOPLES workshop and is also available on arXiv; you also can find the BibTeX data of this publication below:

@article{Sidarenka:16,
  author    = {Uladzimir Sidarenka and Manfred Stede},
  title     = {Generating Sentiment Lexicons for German Twitter},
  journal   = {CoRR},
  volume    = {abs/1610.09995},
  year      = {2016},
  url       = {http://arxiv.org/abs/1610.09995},
  archivePrefix = {arXiv},
  eprint    = {1610.09995},
  timestamp = {Mon, 13 Aug 2018 16:46:10 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/SidarenkaS16},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

There might be another paper on NWE approaches in the next year; and they also will be described in my dissertation, which is about to appear in the next two weeks. I'll post a link either here or in the README file, once it has been published.

As to word-embedding-based algorithms, so far, I've obtained one of the best results with the k-NN approach and the seed set by Kim and Hovy (2004):

./bin/vec2dic --type=1 VECTOR_FILE data/seeds/kim_hovy_2004.txt

The VECTOR_FILE should be in word2vec plain text format with tab-separated fields. I only experimented with word2vec, but you can try other vectors, such as fasttext or BERT, maybe they will yield better scores.

RacheleSprugnoli commented 5 years ago

Dear Wladimir, thanks a lot for your kind and quick reply! Looking forward to reading your thesis! Best, Rachele

WladimirSidorenko commented 4 years ago

Hello Rachele,

You can find the thesis here and cite it as:

@phdthesis{Sidarenka2019,
  author      = {Uladzimir Sidarenka},
  title       = {Sentiment analysis of German Twitter},
  type        = {doctoralthesis},
  pages       = {vii, 217},
  school      = {Universit{\"a}t Potsdam},
  doi       = {10.25932/publishup-43742},
  year        = {2019},
}

if you wish.

I think you might be most interested in the third chapter. Let me know if you will also be interested in the results of the linear projection method.

RacheleSprugnoli commented 4 years ago

Thanks a lot! I will use it for the paper I'm writing. Best, Rachele