Open HyunkuKwon opened 4 years ago
I am not at all familiar with network analysis, so I wonder what is the norm for choosing an example graph? The author chose data on marriages among Renaissance families in Florence. However, how does he know this graph alone would be sufficient to support his argument? What if the graph is more complicated (like the one in fig 5), is bigger or smaller in size, is a tree, etc.?
Clustering with data in nodes: Like @wanitchayap, I am not at all familiar with network analysis, either. So I was wondering when we do clustering, could we also use the information contained in the nodes to improve accuracy? Or should we stick to the analysis of edges?
For example, in the case of clustering proteins based on their functions, could we throw in additional data about the proteins such as their empirical formulae and R-values to improve clustering performance, or should we stick to what the edges represent, which in the example from Fortunato (2010) is whether the two proteins are found in one same process.
Since I have never been exposed to network analysis before, my question can be somewhat general.
In this paper, the author gives an introduction to four kinds of centrality measurement, i.e., closeness centrality, betweenness, eigenvector centrality & degree centrality and their strengths and weaknesses. For example, closeness centrality is inappropriate in dealing with flows of parallel duplication, and betweenness is not ideally suited for flows like infections and gossip. It seems different centrality measurements complement each other so I wonder if there is some "ensemble" method that can combine them together?
My question is in relation to the Bayesian Echo Chamber paper that was shared in the code. I believe the results of this paper could be incredibly useful for my final project so I wanted to get your thoughts on adapting the methodology to a different setting. Would it be possible to measure influence via linguistic accommodation in the context of a social media network such as Twitter?
The premise here being that this is a public forum, in which some voices are more influential than others - and the influential voices shape the semantic and linguistic characteristics of the conversation. If we set out to measure linguistic accomodation in this setting, should we be concerned that our sample of tweets might include some (or many) observations that never actually interact with one another (e.g. user A and user B might post tweets discussing the same thing one after another without ever reading each others tweets)?
For Fortunato, Santo. 2010: “Community Detection in Graphs.”
Amazing paper on the network analysis, and I think it provides systemic knowledge for us to understand this field and its implications. Here are my questions:
Considering the Clique Percolation Method (CPM) mentioned in Section 11, it is pointed out that by CPM, "there is a considerable fraction of vertices left out of the communities", like cutting the leaves. To tackle this issue, a post-processing procedure is needed to re-include these nodes back to the community. I wonder if this post-processing procedure is done manually or by algorithm? I mean, if it is done by setting a new criteria/algorithm, it together with CPM could form a combo and tackle this issue. However, if is done manually, like our setting of "stop words" in assignments, then it could be better to improve CPM itself rather than combining CPM and this post-processing procedure.
Though I may just miss the description of it since this paper is too long, I wonder if we could apply some ensemble methods on these algorithms of community detection, which is also pointed out by @linghui-wu (she mentioned in centrality) ? From the introduction of the paper, many models have their pros and cons, which seems a good reason to do the ensemble. Back to the Question 1, the loopholes of CPM could also be addressed by the ensemble. Also, what are the potential difficulties regarding the ensemble of methods in community detection?
I think signed and unsigned networks are very important concepts mentioned in this paper. In R, there are already related package to build these networks. My question is more practical. I wonder in practice, how do we decide that we should choose signed or unsigned networks in our own project? Should we just try both of them and see which ones are more suitable?
Borgatti (2005):
While reading the article, one question that haunted me was: why can't we do it the other way around? If we need a measure of node centrality, can't we just simulate 10,000 processes and use the simulated result as a measure? Or even more, if we have empirical data on how the communication between nodes was, can't we just evaluate that data? Of course, the measures will be mathematically less valid than the other measures, but wouldn't that be more realistic? (as a person who has close to no understanding of networks, I hope this question makes sense!)
For Borgatti, Stephen P. 2005. The author discusses how different centrality measures should be employed in different networks according to the typology of the network flows. It makes me think of this week's homework, where we were asked to construct a word-by-word network and calculate different centrality measures. To be honest, I cannot find an exact trajectory or method of spread among the words, so I am wondering how could we employ the insights in this paper to characterize the word-by-word network? Is there a particular measure that fits this kind of network best?
The Borgatti paper helped to contextualize graph theory in the social sciences and provided a structured way to think about the applicability of centrality measures in different problems. I have one question about measuring influence:
Eigenvalue closeness/adjacency was described as an appropriate measure of node influence. PageRank appears to use a similar approach to measure influential websites. If I understand correctly, "influence" in this case is a measure of proximity to other important nodes, not direction. For example, does this allow us to say A has an influence on B? Are there other ways of looking at this or is there another measure more appropriate?
I have a general question about expected vs realized centrality. In reality, do we actually know realized centrality, or do we approximate it as we estimate population mean in statistics?
Another question that I have is that the author mentions the mooch process as following a 'transfer-path' framework. What is a more detailed explanation or real life example of a mooch process?
I have a general question about centrality - it's defined as a measure of the most "influential" entity in a network. What are the various meanings of "influence" in this context? For example, in the gossip example in Borgatti, is the person spreading the gossip the most the "central" figure, or the person about whom the gossip is being spread?
One obvious limitation is that a centrality which is optimal for one application is often sub-optimal for a different application. What would we do to address that?
Borgatti (2005): It helps me to categorize different typologies of networks, which correspond to different network dynamics. The paper also introduces four sets of definitions of centrality and illustrates the embedded assumptions behind each definition. The example makes it clear that different centrality measurement has different appropriate usage settings. I was not so confident about the Florence transportation example. The paths were built for shortest distance between two places, so it suits the packages model by definition. The author uses the same graph to simulate other flow processes. What if the random walk network would be different from the given paths?
Correct me if I am wrong, but networks are typically represented two dimensionally. Would there be any benefit to modelling them in three dimensions? Would it be possible to add a measure of the distance from one node to another, say the time that it takes for a delivery driver to deliver to more distant addresses vs less distant ones, or is the time it takes to transmit between every node considered to be the same/not relevant?
In the applications listed in Fortunato (2010), the focus seems to be exploratory and understanding the hidden pattern of the network structure. However, it doesn't help the agents in the network to make strategic decisions. As a result, I am wondering how can network help with agents make better decisions?
This paper:"Centrality and network flow" is pretty helpful for me to understand the implicit assumptions on flow processes behind the measurement of centrality. So I would be careful to choose the fitted measurement in my research scenario. I have a general question about the consideration on the flow processes, since I used to assume that it depends on whether we would like to consider the time as a factor to interpret something, and what is a more proper flow process of words or ideology? Are the media important when we are talking about the culture? How network analysis would interpret the phenomenon on ideas passes through the broadcasting, like newspaper?
For Fortunato, Santo. 2010: “Community Detection in Graphs.” It is so so long like a short book while it is very informative. I'm wondering, since there is a decade after the book is published, has the empirical data changed a little bit now when we are collecting data to map communities (i.e. demographic surveys, email correspondences etc.)? How we use digital data to map community networks now?
My question is whether the media important when we are talking about the culture?
Post questions here for one or more of our fundamentals readings:
Fortunato, Santo. 2010. “Community Detection in Graphs.” Physics reports 486(3-5): 75-174.
Borgatti, Stephen P. 2005. “Centrality and Network Flow.” Social networks 27(1): 55-71.