Closed rdatasculptor closed 6 years ago
Yes, I completely agree on that. I have such a function locally available as part of another package (which is not distributed however). I believe plotting functionalities should be put into another R package as plotting functionalities are not udpipe specific but should work for other text mining R packages also.
Nice to hear you agree with that. I also agree it shouldn't be part of udpipe itself. I think it is possible with ggraph, the package you use in your example visualisations, but I haven't figured out yet how (to implement that in e.g. cooccurence visualisation.). Any plans in using your function in your visualisations?
The tidygraph
r package has some measure of centrality as well as the igraph
package if you want to obtain that or you can just start at the help of ?geom_node_text
to specify the size of each node
currently i have no plans to include such plots inside the package - i even was thinking more to use other graph packages than ggraph (https://github.com/iankloo/sigmaNet in particular) but it's low on my todo list - feel free to contribute if you have time available.
I will give it a try when In have time! That will not be before next week.
Here you have some inspiration. Feel free to report what you finally have come up with.
library(udpipe)
data(brussels_reviews)
comments <- subset(brussels_reviews, language %in% "es")
ud_model <- udpipe_download_model(language = "spanish")
ud_model <- udpipe_load_model(ud_model$file_model)
x <- udpipe_annotate(ud_model, x = comments$feedback, doc_id = comments$id)
x <- as.data.frame(x)
cooc <- cooccurrence(x = subset(x, upos %in% c("NOUN", "ADJ")),
term = "lemma",
group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooc)
nodes <- txt_freq(subset(x, upos %in% c("NOUN", "ADJ"))$lemma)
nodes$name <- nodes$key
nodes$nodesize <- nodes$freq
library(igraph)
library(ggraph)
library(ggplot2)
wordnetwork <- head(cooc, 30)
wordnetwork <- graph_from_data_frame(wordnetwork,
vertices=subset(nodes, name %in% c(wordnetwork$term1, wordnetwork$term2)))
ggraph(wordnetwork, layout = "fr") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_colour = "pink") +
geom_node_text(aes(label = name, size = nodesize), col = "darkgreen") +
theme_graph(base_family = "Arial Narrow") +
theme(legend.position = "none") +
labs(title = "Cooccurrences within sentence", subtitle = "Nouns & Adjective")
Thank you for this inspiration! I think this code is at least 90% of what I had in mind. Monday or tuesday I will be back at my laptop, and try to work on this code. I will let you know the result.
Down in 5
On 5 May 2018, at 17:33, rdatasculptor notifications@github.com<mailto:notifications@github.com> wrote:
Thank you for this inspiration! I think this code is at least 90% of what I had in mind. Monday or tuesday I will be back at my laptop, and try to work on this code. I will let you know the result.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bnosac/udpipe/issues/23#issuecomment-386817773, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AK99f5ag3oFaPdEO28oZ6fR9r05lP_5Lks5tvdRjgaJpZM4Tx3JD.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
I think the code is very good. I altered it a little bit to get the nodesize that (to my opinion) fits the network selection better. This way it only uses de word frequency within this selection of the top 30 cooccurrences.
library(udpipe)
data(brussels_reviews)
comments <- subset(brussels_reviews, language %in% "es")
ud_model <- udpipe_download_model(language = "spanish")
ud_model <- udpipe_load_model(ud_model$file_model)
x <- udpipe_annotate(ud_model, x = comments$feedback, doc_id = comments$id)
x <- as.data.frame(x)
cooc <- cooccurrence(x = subset(x, upos %in% c("NOUN", "ADJ")),
term = "lemma",
group = c("doc_id", "paragraph_id", "sentence_id"))
head(cooc)
#nodes <- txt_freq(subset(x, upos %in% c("NOUN", "ADJ"))$lemma)
#nodes$name <- nodes$key
#nodes$nodesize <- nodes$freq
library(igraph)
library(ggraph)
library(ggplot2)
library(dplyr)
wordnetwork <- head(cooc, 30)
nodes1 <- data.frame(name=wordnetwork$term1,freq=wordnetwork$cooc)
nodes2 <- data.frame(name=wordnetwork$term2,freq=wordnetwork$cooc)
nodes <- group_by(bind_rows(nodes1,nodes2),name)
nodes <- summarise(nodes, nodesize=sum(freq))
wordnetwork <- graph_from_data_frame(wordnetwork,
vertices=subset(nodes, name %in% c(wordnetwork$term1, wordnetwork$term2)))
ggraph(wordnetwork, layout = "fr") +
geom_edge_link(aes(width = cooc, edge_alpha = cooc), edge_colour = "pink") +
geom_node_text(aes(label = name, size = nodesize), col = "darkgreen",repel=TRUE) +
theme_graph(base_family = "Arial Narrow") +
theme(legend.position = "none") +
labs(title = "Cooccurrences within sentence", subtitle = "Nouns & Adjective")
Thanks for the feedback on what you've come up with.
First of all: thank you for make udpipe available in R! It's a great package.
I was looking at your example network visualisations and I was wondering if they could be improved by not only showing the different edge sizes but also by showing different word (node) sizes dependent on the sum of all edge sizes (of the edges linked to the node). To my opinion this would be resulting in a wordcloud 2.0. I am curious about your opinion about this.