For Frankfurt Meeting - Githubissues

rkrug commented 8 months ago

[x] Step 1. Rainer will search for a sample of 50 papers to test whether the number of papers we have in the Corpus based on the key words we used is robust enough, or if it is necessary to refine the key words.
[x] #2
[ ] Step 3: We will conduct the following analysis:
1. [x] 1. Temporal analysis over time
2. [ ] 2. Amount of money allocated to each reform.
3. [ ] 3. Sectoral analysis (the same we have in the EHS dataset): agriculture, fisheries/aquaculture, mining, forestry, road infrastructure, irrigation).
4. [ ] 4. Extract information (whenever possible) about economic, social (health) and cultural effects of the subsidies reforms on nature and people.
5. [ ] 5. International agreements: One of the analyses we might want to do is to see how international agreements have impacted the evolution of subsidy reforms. For that analysis, we need to have a timeline of international agreements to reform EHS. Viki started to collect the list of them, please contribute to it here.
6. [ ] 6 Compare sentiments before and after SDG and Paris Agreements (2018)

niamir commented 8 months ago

We will do our best to deliver the Sentiment scores by Friday

rkrug commented 8 months ago

for (3) one could use all available sub-fields and manually sort them into clusters relating to 3 and analyse the topics of each work (all or only the main topic?) @niamir any opinion on that?

niamir commented 8 months ago

SentAnalysis_Scores.csv

Here is the outcome of the lexicon based sentiment analysis.

niamir commented 8 months ago

For analyzing the sentiments of the provided abstracts, we have used the Python NLTK package, and VADER (Valence Aware Dictionary for Sentiment Reasoning) which is an NLTK module that provides sentiment scores based on the words used. VADER is a pre-trained, rule-based sentiment analysis model in which the terms are generally labeled as per their semantic orientation as either positive or negative. The main advantage/reason for using this model was that it doesn't require a labbed training dataset. The output of the model is 4 statistical scores: compound, negative, neutral, and positive. The compound score is a composite score that summarizes the overall sentiment of the text, where scores close to 1 indicate a positive sentiment, scores close to -1 indicate a negative sentiment, and scores close to 0 indicate a neutral sentiment. The other three scores show the percentage of each of the sentiments in the text.

niamir commented 8 months ago

Perhaps a linear graph: x= time y = score values = line 1 (average of neg score per year) line 2 (average of pos score per year)

rkrug commented 8 months ago

There are more things that could be done - e.g. mapping to countries of first authors.

rkrug commented 8 months ago

Please see https://ipbes-data.github.io/IPBES_TCA_Ch5_subsidies_reform/Report.html#sentiment-analysis-1 for the preliminary analysis.

rkrug commented 8 months ago

@niamir Any ideas about the point 2 - 5? Will we be involved?

niamir commented 8 months ago

Points 2-5 are pending to the TCA team inputs

rkrug commented 8 months ago

https://docs.google.com/spreadsheets/d/1ZCB_St2TQu_wL3yl1iN7Wxz5FERGxk-GZA0_2oGcRM0/edit#gid=210658071

IPBES-Data / IPBES_TCA_Ch5_subsidies_reform

For Frankfurt Meeting #1