Computational-Content-Analysis-2020 / Readings-Responses

Repository for organising "exemplary" readings, and posting reponses.
6 stars 1 forks source link

Extracting Communication Networks - Danescu-Niculescu-Mizil et al 2013 #23

Open jamesallenevans opened 4 years ago

jamesallenevans commented 4 years ago

Post questions here for:

Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J. and Potts, C., 2013. “No country for old members: User lifecycle and linguistic change in online communities.” In Proceedings of the 22nd international conference on World Wide Web: 307-318.

laurenjli commented 4 years ago

This was a very interesting article that was able to take changes in a user's linguistic adaptation and use it to predict a user's lifecycle in an online community. The authors used text from two beer rating communities which both have a specific and narrow focus. This makes me wonder how much their findings or research method for identifying user engagement over time can be expanded to a more general online community in which the topic may be more broad. Furthermore, the authors only mentioned using reviews and often times online communities have features like liking posts or commenting. How might these additional network elements enhance or change their methods?

katykoenig commented 4 years ago

The authors' findings regarding the relationship between linguistic adaption and the lifecycle of a user were motivating, but I found their application of their findings in supervised machine learning somewhat lacking. Specifically, the paper generated novel features for labelling, which was intriguing, but the performance of their classifier seemed quite low. For all communities, the F1 score was about as good as a coin flip (or worse) and looking for closely, we see that the precision was high across samples while recall was really low, reflecting that the classifier is labeling almost all of the observations as negative, no matter the features. So, I wondering 1.) what was their threshold for classification as they used a logistic classifier, e.g. was an output of 0.5 and above classified as a positive or something higher? and 2.) would they have had better results using a different classifier?

ziwnchen commented 4 years ago

This article provides inspiring findings in users' linguistic changes in online communities. However, as @katykoenig mentioned, I also have some questions in the final prediction part:

(1) I'm not sure I fully understand the binary classes. It seems that the logistic regression the authors use could assign users to either "departed" group who leave in a relatively short time range (e.g., 20 to 50 posts) or "living" group who continue to be active after posting many posts (e.g., 200+ post). But what about those in between (i.e., those who leave during 50~200 posts)?

(2) Compared to the simple activity model, all "full" models with the 5 features suffer from a significant drop on precision and an increase in recall... this looks like a shift in labeling strategy rather than an actual performance improvement...

(3) One of the most important findings in this article is the relativity of the user lifecycle (e.g., user lifecycle is not bounded with absolute, biological time). However, features related to this finding does not appear in the final prediction section, which somehow makes me confusing.

sunying2018 commented 4 years ago

This paper proposes an innovative framework for tracking linguistic changes and understanding how users react to these evolving norms. The result is really fascinating and they identified a two-stage lifecycle in terms of linguistic change. However, I have some questions in features they used for learning and the binary logistic classifiers model. In terms of features, this article does not talk much about the feature selection procedure and if there is collinearity existing in the present feature space. Besides, as mentioned by @katykoenig , the selected model seems not to perform well. So what's the reason for the selection of this model and did they compare other ML models to achieve better performance?

ccsuehara commented 4 years ago

“It takes a long time to become young” was a quote I didn't understand at first but then it had all the sense to put it there, since linguistic adolescence, or the time by which members are fully adapted in the usage of the language of the community is not necessarily a linear process, like age.

I was wondering if maybe the results of figure 7 (users) are also reinforced by the results in figure 5 (community), and also viceversa!: In figure 7 we see that language flexibility stagnates at a certain level, conservative language tends to reign after a while and lexical innovations decrease rapidly. The result of this, in the aggregate, is more predictability of used language(figure 5). The users influence the community they are part of, but the community also influence the users. How could we account for that effect in this research?

bjcliang-uchi commented 4 years ago

I am pretty much convinced by the part of language flexibility and linguistic progressiveness. However, I am not entirely sure about the robustness of the prediction task relative to the change in m and n (the departed and living range). Also, I am wondering how much of this linguistic difference is purely about adopting community words. Is it possible that instead, what the algorithm captures is the authors' tiredness over beer/ disappointment of the community, etc, which lead to their departure?

ckoerner648 commented 4 years ago

Danescu-Niculescu-Mizil et al 2013 have analyzed ten years of online comments from two different beer-rating-websites. They find that the longer a person is a member, the less likely she is to adopt a new vocabulary, but rather sticks to old terms she has used in her early time as a member. New members are very likely to adopt new terms. The results are exciting: they can help to identify users who are likely to depart from a community or to predict a user’s success. It is interesting to see which variables are predicting change. Hence, I’d be curious if there might be other variables that could explain how members of a community create a relatively stable common language across generations or other differences.

HaoxuanXu commented 4 years ago

This paper is really interesting in showing users' levels of cultural embeddedness to the forum based on the level of cross-entropy of the posts. It'd be interesting to know if the result of this model can predict the span the user is in based on a specific post. It may be difficult to tell if the user is at the stage of increasing cultural assimilation to the community, or at the stage of decreasing cultural assimilation just by using one specific post.

skanthan95 commented 4 years ago

This revealed that users follow a determined two-stage lifecycle: A linguistically innovative learning phase in which users align with the language of the community, followed by a conservative phase in which users stop responding to changes in community norms.

In this paper, the authors study online communities where individuals rate (and discuss) different types of beer. I wonder how their results might've differed if they looked at other sorts of communities - for example, an online support group, where a person might be more likely to form closer bonds with the other members (and feel incentivized to continue adhering to the linguistic norms) than they might if it was a group about shared superficial interests. I would imagine that participants would be more likely to use similar language with members of a group that they felt more deeply connected to - would the two-stage lifecycle apply to these sorts of groups as well?

xpw0222 commented 4 years ago

This is a very interesting paper that empirically verifies my assumption about online forums. I am wondering about its generalizability to other forums. These two forums are less open to the general public due to their nature--small communities formed based on interest. Also, these forums are like "monopolies" in the field--users can only choose to stick to them or abandon them, without a choice of switching to alternatives. I guess in more open online forums like Reddit, the pace of evolution will be different. My hypothesis is that the norm is going to switch more slowly because it takes time for every member from different backgrounds to adopt.

sanittawan commented 4 years ago

This paper is certainly an interesting read. My question pertains to the aspect that the paper did not discuss. I wonder how one should go about studying who started the competing trend; for example, was the driver of change from aroma to smell old or new users? It would be interesting to me to learn if some groups of old users, in fact, instigated the change in the use of language .

yirouf commented 4 years ago

Very interesting paper. I'm beginning to be very curious about Danescu-Niculescu-Mizil's other works. Identifying user's cultural embeddedness to the forum in regard to vocab. I find it interesting that what they find in the beer-rating website is really similar to how people become less creative once they really master something because they would stick to what they know. So i was wondering how this might be accounted for. People who are expert in certain tasks that involves vocab might display a certain linguistic pattern that is specific to his or her mastery but not other variables?

chun-hu commented 4 years ago

This is an interesting paper on linguistic change in the community. The authors talked about how users generally die "linguistically old" and their language become more conservative after linguistic adolescence. It is possible that they turn to other communities that interest them more? Also, I'm just curious about another type of user -- who looks at the post frequently but does not post and engage in the community. Will their language use change at all? I know there is probably no way we can test this, but just curious about that!

cytwill commented 4 years ago

I am quite interested in the two stages that users would experience in the online communities regrading their language routines. Does this phenomenon happen in other online communities other than beer forums? And as some large communities like social media platforms including various topics and even in different languages, so if we need to apply the framework to such a general online environment. What kind of modifications related to the metrics to measure the consistency between individual language and community language need to be made?

rkcatipon commented 4 years ago

In some ways then, this piece felt like digital archaeology, in that researchers could only understand a user's linguistic patterns after they were "deceased". I found one of the discovered patterns curious. Why does users' language become more rigid before they leave the platform? While the authors do a nice job describing the beer community, I was left wondering about the causal factors.

Another thought, from what I understood of the paper, an online community is less likely to experience innovation in a language without the acquisition of new users. That makes sense, as a network is exposed to new modes, it is influenced and exhibits behavior change. We could imagine then, that this community without new users would remain in a linguistic stasis. In the physical world, I wonder though, how isolated communities or "uncontacted peoples" are able to develop language without exposure to other groups. There seems to be some divergence here between the physical and digital realm.

arun-131293 commented 4 years ago

The paper's main conclusions as they state is " our framework ... reveals a crucial difference from findings in offline settings: the moment when linguistic adolescence ends— and the user is at a peak linguistic harmony with the community—is not bound to an absolute or biological time-frame, but instead is relative to the users’ own ultimate lifespan." They call this result "surprising", but is it really surprising that people stop being interested enough to learn the newly emerging lingo of a forum just before they leave. The causal factor here could be just a growing lack of interest and involvement both in learning (and certainly creating) new words? It seems to be their results are what you would expect.

Additionally, the comparisons to natural language learning in humans is misleading; language learning is certainly not a function of interest but a natural biological property of human beings that requires no conscious effort, as has been understood in linguistics for a long time.

kdaej commented 4 years ago

I wonder if the language is affected by the community people are in at the moment throughout any setting. Psychologists have suggested that some human abilities are domain-specific. For instance, if we memorize some texts in a certain situation, we can better retrieve the memory in the same setting where the information was gained. Is it possible, people in this study use the community language only in specific settings or does it extend to their daily language?

YanjieZhou commented 4 years ago

I find it very interesting to investigate this topic about liguistically aging. As far as I am concerned, a language develops mostly by extending beyond its original dimensions, whether in terms of users or economy, which means that the development of economy can also spur the development of language and the young can use the language in a brand-new way which adds to the vitality of the language. I do not know if my idea has been examined or matches the existing ideas.

Lizfeng commented 4 years ago

This paper tracks linguistic change as it happens for understanding how specific users reach to these evolving norms. The database are two large online communities - RateBeer and BeerAdvocate. It discovers that users follow a two-stage lifecycle: a linguistically innovative learning phase and a conservative phase. This framework can be used to detect how long a user will stay active in the community. Thus, it has practical significance for those who design and maintain online communities. I have one concern for this research. I have looked into the dataset RateBeer. It seems like most of its members are more likely to respond with pictures rather than comments. Would it be better if the researcher could incorporate beyond text technique into their research?