Armand1 / Evolution-Revolutions

This is a continuation of the Evolution and Ecology text mining project
1 stars 0 forks source link

Topic Frequencies #5

Open SamMckaylin opened 4 years ago

SamMckaylin commented 4 years ago

I've managed to get the topics analysis to work and output plots. Comparing to those in "The Nature of Ecology and Evolutionary Biology" frequencies are very different. Do you know how the data was subset in that case? I'm going to change the error bars from standard deviation to standard error + change the plot to a three year moving average but wanted to understand what decision you made with regards to subsetting and why? Currently I am using all of the ecology and evolution papers from 1850-2010.

Plot from current analysis - Y axis made equal to the below plot to demonstrate the differences in frequencies Screenshot 2020-04-13 at 13 33 46

Plot from Nature of evolution and ecology

Screenshot 2020-04-09 at 10 57 26
Armand1 commented 4 years ago

@SJMcKay --- I don't think this is a subsetting difference. I am doing to send you two R files --- they're not super neat but they should help you identify why you're not getting my results.

The first, "means and standard deviations for all topics" gets the means and SDs for all topics by year. It outputs a file called "EEpapers_topics_byyear_05.csv"

The second, "historical topics" takes "EEpapers_topics_byyear_05.csv" as an input and produces my figures above.

Your job is to look at them and figure out why you're not getting the same thing. Do this ASAP and we'll have a Skype meeting this pm or tomorrow.

SamMckaylin commented 4 years ago

Screenshot 2020-04-13 at 11 14 04

SamMckaylin commented 4 years ago

For reference my original climate plot (using the wrong data) looked like this: Screenshot 2020-04-15 at 19 16 10

Armand1 commented 4 years ago

Exactly. An interesting lesson here in being very careful how you graph things. Can't be taught --- can only be learned; part of the craft. So well done.