Amend sentiment.md - Githubissues

michael-evrythngwrx commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently there isn't a reference for the absolute min and max sentiment score after processing.

Describe the solution you'd like A clear and concise description of what you want to happen. Can you provide the scale of sentiment in the sentiment.md file. For example: -1 -> 1

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

jesus-seijas-sp commented 1 year ago

The sentiment analysis is done with Lexicons, and there are three type of possible lexicons for each language:

AFINN: each word of the lexicon has a value between -5 (very negative) and 5 (very positive)
Senticon: each word of the lexicon has a value between -1 (very negative) and 1 (very positive)
Pattern: each word of the lexicon has a value between -1 (very negative) and 1 (very positive) There is a table of the lexicons available per language: https://github.com/axa-group/nlp.js/blob/master/docs/v4/language-support.md#sentiment-analysis

In sentiment analysis usually you look for a classification into 3 classes: negative, neutral and positive. So if the score is negative, then the sentiment is negative; if the score is 0, then the sentiment is neutral; if the score is positive, then the sentiment is positive.

The result of the sentiment analysis indicates the type of lexicon used in the property type.

Example of response of the sentiment analysis:

 { score: 0.313,
   numWords: 3,
   numHits: 1,
   comparative: 0.10433333333333333,
   type: 'senticon',
   language: 'en' }

score will contain the total score for the sentence calculated with the lexicon
numWords will contain the number of words of the sentence
numHits will contain the number of words of the sentence that are in the lexicon dictionary
comparative will contain score / numWords
type will contain 'afinn', 'senticon' or 'pattern'
language: the language

To understand better AFINN I recommend the lecture of the original paper from Finn Årup Nielsen: https://arxiv.org/abs/1103.2903

To understand the accuracy between different methods of lexicon sentiment analysis, https://www.researchgate.net/publication/343473213_Evaluating_the_performance_of_the_most_important_Lexicons_used_to_Sentiment_analysis_and_opinions_Mining

In NLP.js there are two main improvements: negations and calculation of the stem of words. There is also a paper analyzing improvements on sentiment analysis, not all used in NLP.js: https://www.sciencedirect.com/science/article/pii/S2090447921003105

I hope you enjoy the reading

michael-evrythngwrx commented 1 year ago

Here is something I am running into: { sentiment: { score: 1.563, numWords: 20, numHits: 7, average: 0.07815, type: 'senticon', locale: 'en', vote: 'positive' } }

Senticon is returning a score greater than 1.

jesus-seijas-sp commented 1 year ago

In my previous comment: "score will contain the total score for the sentence calculated with the lexicon" Total = Sum of several numHits is 7, so there are 7 words in your sentence that are in the lexicon, as numWords is 20 that means that there are 13 words not present in the lexicon. 1.563 is the sum of all scores of this 7 words.

axa-group / nlp.js

Amend sentiment.md #1357