Open HyunkuKwon opened 4 years ago
Controlling for length effects - Wouldn't it be better to randomly choose k bigrams instead of the first k words? I wonder if choosing the first k words could lead to bias. That is an inaccurate representation of the whole post, especially for longer posts. If there is a systematic way user composing the posts (e.g. introduction first then more actual content later), first k words wouldn't work anymore. On the other hand, wouldn't normalize the cross-entropy with the length of the post works too?
Prediction task - The first-person singular pronouns feature seems to be the least useful among all features. In fact, it lowers F1 when w=40. Could this be because of the extreme nature of coding for this feature? That is it is only coded 0 if there are NO first-person singular pronouns at all. Shouldn't it be coded by comparing each post to the community average?
More diverse community - Both beer communities studied here are very specific in topic and purpose. It would be interesting to see if this holds and to what extent it is held for more diverse communities. For example, Twitter as a community doesn't have a specific topic, but there seems to be a unique linguistic style associated with this community.
This article reveals some interesting factos about people's linguistic changes in different online communities. My question is about the relativity of the user lifecycle discussed here. I wonder why in the final predition process, the features used on this sub-topic did not appear?
This paper provides an interesting approach to the life cycle of online communities. My first question considers the binary logistic model, and I wonder if the multinomial logistic classifier might be more suitable, as actually it is hard to divide the users based on the number of posts into just two groups, namely "departed" and "living"? In the paper, the author uses m and n as threshold, but the examples given (m=30,n=200) are quite separated, which drives me to think about which group to classify for users with number of posts between 30 and 200? What threshold will be applied to this "middle" group?
The second question concerns the external validity of the conclusions, as the target of this paper is mainly on the beer forum, which cannot represent many other forums. What if the forum is a platform for comprehensive topics instead of the single topic like beer? I wonder in such a more complex scenario, how would the two-stage lifecycle theory be modified or evolved? It would be a much more valuable finding for this theory if it could be applied to a general form of online communities.
To be honest I am not quite convinced by the saying "All users die old". The authors deliberately delete the users who still post new reviews on the websites after Jan. 2011. This indicates that the users in the sample will definitely abandon the websites before 2011. Given this sample selection strategy, my understanding is that the users who will die will die old, but I cannot extrapolate this conclusion to those who never die. Hence, I am curious whether those who always remain will display a different pattern.
I think this is great research with a very clever design. I especially liked the fact that they succeeded in building a predictive model out of their data. That being said, they suggested that communication in the community fluctuates, but they did not suggest why it fluctuates. The first thing on top of my head is that as new users are introduced, they give some kind of "shock" to the community with their high cross-entropy remarks and community employ some of the linguistic characteristics of that "shock" into themselves. Was there any follow-up to the research related to this issue?
What a fun paper. Would it be possible to extend this methodology to social media more broadly? There's been some ongoing discussion of the idea that millenials (who were the first users of facebook) have started shifting away from facebook for other platforms (e.g. instagram) and part of this was prompted by older generations increasing their use of facebook. It would be cool to see if linguistic changes can be tracked and similarly predict the life cycle of a facebook (or other social media) user
I like their definition of user's lifespan. It mimics the social life but in a relative manner. The rating website is a special category of network. Users rarely interact with each other. Especially for the two forums in this paper, the link is invisible, as we do not have data on comments, or likes. We only observe that a lexical innovation was adopted, but the diffusion might not be relevant to the forum itself. It might come from the Internet, or a beer ad, or a fashion word. We can't conclude it is learned from other users. My question is, in what sense can we describe the rating website as a network?
Very interesting topic. I am interested in the possible projection of this community-based linguistic lifecycle to a real lifecycle. As we know, different generations use different set of words. First, does the study account for the ages of the users? Also, what does the result suggest about effective communication?
Adding on to @Harryx113 's comment, I think it would be really interesting to see how demographic variables would affect the user's lifespan as well as it's adapting pattern. Would the same pattern be found across different groups of users (e.g. elderly vs young 00s)? It would also be interesting to see how communities of different natures (e.g. Beer Advocate vs Kpop fan club) would influence users' linguistics change across time.
Since they have a binary classification task, why didn't they compare the logistic classifier result with other classification methods?
This paper does an excellent job in describing the linguistic change. It shows users' language become more conservative after linguistic adolescence. I am wondering is it possible that they turn to other communities? I am also curious about who looks at the post much more frequently?
I really like this paper. Measuring community change from linguistics perspectives is really a cool method: Old people stick to their early stage convention to use the language, while the community itself is changing rapidly and leaving the old people behind. My question is based on my understanding on this abandon: As 'new comers' floods in community with new lexical expressions they occupy the the power of language soon, the 'old people's' usage seems old school style. Is it because the usage of new language that impede old people from communicating with others: they become less progressive and then they leave the community. How can we distinguish the two possible reasons that people leave community for they really do not care about the virtual discussion anymore(natural drop because they return to the real life) or the impact of new comers? A further question can be: Community change is based on the new blood or new technology?
Nice work! My question is based on my understanding on this abandon: As 'new comers' floods in community with new lexical expressions they occupy the the power of language soon, the 'old people's' usage seems old school style.
Post questions about the following exemplary reading here:
Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J. and Potts, C., 2013. “No country for old members: User lifecycle and linguistic change in online communities.” In Proceedings of the 22nd international conference on World Wide Web: 307-318.