Closed rkrug closed 7 months ago
We will do our best to deliver the Sentiment scores by Friday
for (3) one could use all available sub-fields and manually sort them into clusters relating to 3 and analyse the topics of each work (all or only the main topic?) @niamir any opinion on that?
Here is the outcome of the lexicon based sentiment analysis.
For analyzing the sentiments of the provided abstracts, we have used the Python NLTK package, and VADER (Valence Aware Dictionary for Sentiment Reasoning) which is an NLTK module that provides sentiment scores based on the words used. VADER is a pre-trained, rule-based sentiment analysis model in which the terms are generally labeled as per their semantic orientation as either positive or negative. The main advantage/reason for using this model was that it doesn't require a labbed training dataset. The output of the model is 4 statistical scores: compound, negative, neutral, and positive. The compound score is a composite score that summarizes the overall sentiment of the text, where scores close to 1 indicate a positive sentiment, scores close to -1 indicate a negative sentiment, and scores close to 0 indicate a neutral sentiment. The other three scores show the percentage of each of the sentiments in the text.
Perhaps a linear graph: x= time y = score values = line 1 (average of neg score per year) line 2 (average of pos score per year)
There are more things that could be done - e.g. mapping to countries of first authors.
Please see https://ipbes-data.github.io/IPBES_TCA_Ch5_subsidies_reform/Report.html#sentiment-analysis-1 for the preliminary analysis.
@niamir Any ideas about the point 2 - 5? Will we be involved?
Points 2-5 are pending to the TCA team inputs
[ ] Step 3: We will conduct the following analysis: