mp-topics - Githubissues

brieaspasia commented 4 years ago

Keyword network diagram

[x] separate data into two timeframes and compare wordcloud output over time
[x] visualise the top keywords per country
[x] separate by country and look at the top keywords for top 5 countries (could be a table)
[x] How can I take the results of NetMatrix and create a df with weights and interactions that will work with Gephi? Do I use networkStat?
[x] R adjusts graphic output to the size of the screen - how do I expand the size of the graphics to make the results legible, unlike the example below? ('Show in new window' doesn't achieve this)

brieaspasia commented 4 years ago

I've solved the issue with the graphics output by setting the parameters in the initial knitr chunk options.

brieaspasia commented 4 years ago

How can I tag the keywords with the article's country in order to make a table of the top keywords for each country?

mlagisz commented 4 years ago

You can create subsets for the n top countries and run keyword analyses on each subset/country "Authors’ Countries is not a standard attribute of the bibliographic data frame. You need to extract this information from affiliation attribute using the function metaTagExtraction":

M <- metaTagExtraction(bib, Field = "AU_CO", sep = ";") dim(M)

M_USA <- M[grep("USA", M$AU_CO),] #create subset with authors from USA dim(M_USA)

brieaspasia commented 4 years ago

I created a file that has the frequency of keywords for each of the top five countries but I'm not sure how to visualise it. I was trying to make either a heatmap or a lollipop chart, but neither was working. What is the best way to show any major differences between the topics in these countries?

topics_country.txt

brieaspasia commented 4 years ago

I tried creating 2 wordclouds to compare recent research (2010-2019) and early research (1970-2009) but neither method worked. In mp-topics I subsetted bib (lines 30-31) to create two files to run individually through lines 50-113, however my computer kept crashing at line 63 because of the filesize. A trial with a smaller subset of the data worked. I also tried to make a loop by assigning an era at lines 34-35, the loop is lines 115-177 but I haven't set it up right because the filter function at 123 doesn't work for a matrix.

mlagisz commented 4 years ago

@brieaspasia. ood work with making nice wordclouds! I just tested it on the "early" data subset (full data crashes my computer). I made some changes and pushed back to GitHub. Dont use the loop. Since you only have 2 subsets, just write code for each (you can later make this code into the function, if you want to be neat and run the function on each data subset). Also, there is a problem with saving the images of wordclouds to a file - I added some suggestion I found on the net, but had no time to test it myself.

brieaspasia commented 4 years ago

The two wordclouds are ready to run in the mp-topics RMD. I tested with a subset of the data and the code should work, but my computer can't process the full set. @itchyshin let me know if you can run and commit the outputs, thank you.

mlagisz commented 4 years ago

Might be useful - wordclouds for keywords of publications - based on keyword frequency (easy to run and save):

install.packages(c("tm", "wordcloud", "RColorBrewer","knitr","bibtex","devtools"))
require(tm)
require(wordcloud)
require(RColorBrewer)
require(bibtex)
library(devtools)
#install_github("knitcitations", "cboettig")
#require(knitcitations)

load(file = "./data/bib.RData") #to load this data frame from a RData file (returns an object named "bib")
names(bib)

bib$DE[1:10]

keywords <- unlist(strsplit(bib$DE, split = ";  "))
keywords <- keywords[!is.na(keywords)]
keywords <- gsub(" ", "_", keywords)
str(keywords)
length(unique(keywords))
ap.corpus <- Corpus(VectorSource(keywords))
ap.tdm <- TermDocumentMatrix(ap.corpus, control = list(tolower = FALSE))
#str(ap.tdm)

ap.m <- as.matrix(ap.tdm)
#str(ap.m)
ap.v <- sort(rowSums(ap.m), decreasing = TRUE)
head(ap.v)
ap.v[1:10]

ap.d <- data.frame(word = names(ap.v), freq = ap.v)
str(ap.d)
table(ap.d$freq)
ap.d$word <- gsub("_", " ", ap.d$word)

#brewer.pal.info #https://www.datanovia.com/en/blog/the-a-z-of-rcolorbrewer-palette/
mypal <- brewer.pal(9,"YlOrRd")
#mypal <- brewer.pal(9,"Greens")

jpeg("wordcloud_keywords.jpg", width=1600, height=800)
#wordcloud(ap.d$word, ap.d$freq, scale=c(5,.5), min.freq=5, max.words=100, random.order=FALSE, rot.per=.2, colors=mypal)
wordcloud(ap.d$word, ap.d$freq, scale=c(4,.5), min.freq=5, max.words=50, random.order=FALSE, rot.per=0, fixed.asp=FALSE, colors=mypal)
dev.off()

pdf("wordcloud_keywords.pdf", width=16, height=8)
#wordcloud(ap.d$word,ap.d$freq, scale=c(5,.5), min.freq=5, max.words=100, random.order=FALSE, rot.per=.2, colors=mypal)
wordcloud(ap.d$word, ap.d$freq, scale=c(4,.5), min.freq=5, max.words=50, random.order=FALSE, rot.per=0, fixed.asp=FALSE, colors=mypal)
dev.off()

#save frequency table to a file
write.csv(ap.d,"wordcloud_keywords.csv",row.names = FALSE)

mlagisz commented 4 years ago

@brieaspasia I managed to run your code for both wordclouds (they came out very nice - well done!) I pushed all the resulting files into the "figures" sub-directory.

brieaspasia commented 4 years ago

@brieaspasia I managed to run your code for both wordclouds (they came out very nice - well done!) I pushed all the resulting files into the "figures" sub-directory.

Thank you!

brieaspasia / mp-diagnostics

mp-topics #7

Keyword network diagram