Open simonsimson opened 3 years ago
In the Paper the authors seem to have done a kind of clustering based on topics the comments did talk about. As there are a lot of comments on the video, doing this by hand takes a long time and is prone to miss comments if just done on a sample size. Machine learning can aid here to cluster all comments in a short amount of time. According to my experiences from last session, it works quite well on youtube comments, giving clusters of similar comments. Also they tried to find out if there are discussions happening or if the comments are mostly one sided or just short sentences without regard to previous comments and without any intent to provoke a discussion. My experiences with ML on youtube comments did show me that our pipeline could also help in finding comment chains that present a meaningful discussion, clustering them together and making them more easy to be identified in the big cesspool of hate or one-line comments.
In this scientific paper researchers Julie Uldam and Tina Askanius took a case study of intense online debates generated around the online mobilization video (“War on Capitalism”) of an activist network Never Trust a COP (NTAC), which called for protest against the 15th United Nations Climate Change Conference. In this paper they investigated how this video was received by the public both in Denmark and in other countries in general. Their focus was the debate around the video. Therefore, they analyzed the comments posted to the video on YouTube and analyzed the role of the platform in “extending the political involvement mobilized around protest events” (Uldam & Askanius, 2013, p. 1185). Although not clearly stated in the paper, I assume the analysis of the comments was done manually by the researchers, following a cluster-type coding model, comprised of two dimensions and multiple categories within each dimension (e.g., date of upload, gender, age and country of origin, values, affinity, etc.). ML algorithms might have helped here with the speed of cluster analysis (when done manually, the cluster analysis takes longer), as well as with the size of data analyzed (many more comments can be included). Furthermore, the ML algorithms might have also identified more easily certain types of comments, such as, in the case of this study, short sentences or exclamations, which don’t seek to start a discussion, or abusive comments, and might have attached them faster to a category within the clustering method.
The paper conducts analysis of one particular video on YouTube called “War on Capitalism”. As the authors state, they considered 391 comments, many of which are just short statements, exclamations, or abuses that do not add relevant information to the main discussion. At the same time, the researchers have a clear goal to find out if YouTube can be a platform that provides a space for engaging people into civil debates and facilitates their self-organization. Therefore, I believe this study does not lack extra machine learning methods. The number of comments is manageable to read and, more importantly, in order to gain meaningful insights the researchers have to dive in lengthy discussions of “politics, neoliberal governance, or Marxist ideologies”. Some comments were even divided into several postings to convey an idea or to argue a point and I think only a person with appropriate training can trace the development of such discussions and draw an appropriate conclusion.
Teammate: @yaozheng600
Comment: In this paper, Uldam et al. explored YouTube’s role in potentially extending the political involvement mobilized around protest events by studying the activist network NTAC and videos to the COP15 protests. In their study, machine learning played an important role. Firstly, they studied the comments in two dimensions. One through sociodemographic categories and one focuses on the comment content itself. In the second part, semantic analysis was used to cluster the ‘enemy as legitimate’ and ‘the enemy as not legitimate’ and the similarity comparison should have also been used to extend their study into a larger dataset. Besides, machine learning was also used to distinguish dialogue and abusive commenting which may contain certain words, punctuation or follow the certain pattern. This work could be very time-wasting without the help of machine learning. We have applied the semantic analysis to the videos we chose, but it seems like we cannot cluster the comments as well as it is showed in this paper. We think the reason for it is the size of our data is too small. We will try to apply our pipeline to a larger dataset instead and see if there are some similar results for the climate change topic.
The study by Julie Uldam and Tina Askanius analyzes the comments of the video “War on Capitalism” authored by an activist network called "Never Trust a COP" (NTAC) in the prelude to the COP15 protests. Rather than specifically studying the impact of youtube on the Copenhagen protest, the aim of this Paper is more generally to analyze the role of Youtube in the formation of communicative spaces that are prerequisites for the emergence of civic cultures. Two different aspects of the comments were analyzed: socio-demographic (gender, age, country of origin ...) and aspects that focus more on the semantic itself. Due to the limited number of comments and the complexity of the analysis performed, I think that using machine learning tools would not have led to significant benefits. However, I believe that at an early stage of the semantic analysis, an algorithm that clusters comments by similarity would have been useful for the authors in order to explore the data from another point of view and could have helped them to gather new ideas.
The study by Julie Uldam and Tina Askanius focuses heavily on one single video on Youtube, which only had 391 comments. The analysis conducted by hand on these comments is able to produce clearer results than if it was conducted with ML. However, this is only possible because of the singular focus on one video with a small amount of comments. One could argue that to make statements about Youtube as a whole platform one would need to analyze a far larger sample size and then compare that to the in-depth analysis that was performed by hand. Since such a large scale analysis is not possible by hand, ML would be quite useful. These new results could then confirm or alter the study's results and lend greater credibility and applicability to the conclusions.
The paper's authors explored Youtube's role in the politics of climate justice by assuming participation in video comments as a way of political engagement. They did it by analyzing online debates, in this case, comments on the "War on Capitalism" video. It seems the authors built the analysis only by following an analytical framework (civic cultures), that is, reading the comments one by one, and did not make use of any machine learning method to support it. This analytical framework consists of two phases, as they stated in the paper: the first one aims to get to know the audience, who speaks, and who is listening. This aspect might be understood by studying features related to comment posters such as gender, age, date of upload, etc. In the second phase, the analysis focuses merely on the content of the comments, and this analysis includes five different categories according to the civic culture model. Clustering techniques such as (K-Medoids or K-Means algorithms) and the Universal Sentence Encoder would help to carry out the categorization for the second phase, as these techniques are meant to find and group semantically similar comments. It is clear that machine learning results have to be supervised by researchers, and this would be quite useful just as a starting point for the analysis or as a point of comparison.
In this paper of Julie Uldam and Tina Askanius they analyzed the comments posted to a video on YouTube and analyzed if the platform extends the political discussion about the topic of the 15th United Nations Climate Change Conference. For the analysis they used 391 comments from one YouTube video and basically did the analysis by hand.
While machine learning would not have hepled much with the analysis they did, it would have given the opportunity to analyse bigger data. Maybe the authors just focussed on these little comments, because they could not handle more by hand.
Also the visualizations they did, where obviously made with Microsoft Excel. There are many machine learning visualization-tools, that could do more interesting graphs that could transfer more information. Also a sentiment-analysis would defentily give quite interesting insights in the commenters since a political debate is often determined not only by the objective information but also by the emotional layer of the messages.
Evaluate in a commentary of 150 words whether and how ML methods could to add to the insights from the paper above.
The Authors of the Paper tried to find out if the platform mechanics of YouTube play a role in forming communicative spaces. They analyzed 391 comments from the video “War on Capitalism” and used two Dimensions. Firstly they created categories by the comments poster and secondly by the comments contend. It is not clear to which extend Machine Learning may or may not have been used during the study. However Machine Learning could be a great help in the clustering of the comments. Groupings which the authors presented such as “short statements or exclamations” or “isolated abusive utterances” could be identified with ML and the filtered from the rest of the data. This would speed the analyzing process up and especially in large comment data sets reduce the number of comments greatly. ML would also prove a great help in identifying discussion chains and user who comment multiple times. I don't think the analysis could have been improved in a significant way with Machine Learning, but it might have allowed the authors to analyze a video with more comments or a cluster of videos to a certain theme by providing the tools to categorize larger amounts of data.
There are already very interesting and diverse comments of my fellow students on the question if machine learning methods would have helped with the study from the paper. I think I can't add any new arguments here. I agree with those who are answering 'no' or 'not really', because the sample size with less than 400 comments was very small and so ML is not needed and would not have improve the accuracy of the manually human generated results. But I also agree with those who think that a manually made analysis would not have been possible for larger data sets without the help of machine learning. So as always, it depends on the objective. Maybe the researchers are or were not used to coding and machine learning. Learning this stuff would have maybe cost more time than just doing it by hand, at least for the small data set.
Also I have to agree with Tobi that the figures look like being from a school project or so. There are definitely better visualization tools.
The paper focuses on the video “ War on Capitalism” on YouTube aiming with exploring the role of YouTube in potential mobilization around protest events. The study is interested in two dimensions: related to comment posters (date of upload, gender, age, and country of the user) and related to the content of the comments.
In general, ML enables dealing with big datasets and testing many approaches and hypotheses, but for the case study in the paper where there is one video with a few numbers of comments (N = 391), ML could be helpful but not indispensable in detecting correlations between the dimensions (if each of the time, the gender, the age, and the country is related to the content of the comments), or clustering the similar comments and attitudes.
On the other hand, the researchers analyzed the content of the comments based on five processes of the civic cultures circuit, I believe that ML can not serve as an alternative to the grounded theory when we deal with abstract concepts such as identities, but it may add another point of view.
Machine learning algorithms became very common in the present time. However, it is sometimes not clear for an average person which benefits can we get from using them and how they could affect the way people get the new information. We can look into the use of machine learning algorithms in the research by looking at the paper written by Julie Uldam and Tina Askanius, where they tried to analyze the comments for the uploaded on YouTube video called “War on Capitalism”. For the analysis, the unsupervised clustering algorithms were chosen. Using them, they could get the clusters, where each cluster could represent similar comments. It is possible to understand if there was a discussion or not, or which opinion is presented at most. It goes without saying, that machine learning algorithms while providing a relatively good analysis, save a lot of time and allow to analyze big amount of data.
It is easy to reach the limits of humans' abilities. Especially when analyzing big data. That’s where ML algorithms come into play and not only saves (lives) time but provide new insights and allow us to look at the problem from a different angle. In the paper written by Julie Uldam and Tina Askanius, one can appreciate how clustering algorithms can provide quite good results while staying extremely time effective. The clusters are used to understand what are the main topics of the discussion, how many comments are stating the relatively same opinion, and even allow to understand if there is a discussion or only one-sided comments. The use of ML algorithms allowed the researchers to use a much bigger amount of data than it could be used if they were analyzed manually. Furthermore, it allows easy use of multiple categories, such as the date of upload, gender, age, country of origin, and so on.
The authors used the framework of civil culture proposed by Dahlgren to examine the online debate emerged around a particular transnational protest event. They used an mobilization video ("War on Capitalism") of the NTAC on YouTube as a case to explore the political engagement mobilized around the YouTube video and how the political debate helps to foster civil culture.
The author analyzed the comments under the video to explore six (actually, I think only 5 are discussed in the paper. I don't know if I counted it wrongly or there was a typo.) dimensions of the civic culture framework. Without explicitly denoting their analysis method, I assume that they did the research manually by viewing all the comments one by one as what we have done for first assignment, and tried to find out the comments which described or reflected one of the dimensions.
I think machine learning methods can help in following ways:
(1) Data visualization. By analyzing the dimension of democratic values, they displayed figure one to show the agonistic values. To be honest, they picked single representative comments and assigned them labels, I can still hardly understand how they analyze the comments for this dimension and how they come to the conclusion of this distribution. With a ML algorithm, we can easily find what label is a comment assigned to and we can also predefine the keywords we want to focus on by data preprocessing. Otherwise, ML algorithms can provide different visualization methods.
(2) Clustering and Topic Modeling The fourth dimension the authors analyzed was "Knowledge" by mentioning "numerous comments cluster around the recurring theme", therefore, they analyzed several threads of comments. I think it is a good use case of algorithms such as clustering or topic modelling. They can use clustering to easily find the comments around a recurring theme and they can also use time series clustering to reorganize the comments around a certain theme on a specific timeline.
(3) Information Extraction For the analysis of the first dimension "Visibility and Reach", the researchers can easily use ML method to get the demographic data such as gender, age, country of origin, etc. Actually for these information they don't need any ML method but if the researchers want to figure out how people from different cultural background involve in online debate, or the influence of their demographic characteristics on their political engagement, they may use ML regression methods for analysis.
(4) Video analysis The author mentioned that although YouTube allows other forms of user responses such as video responses (P.1189), only text-based comments are included in the final data set. I assume because it is demanding to analyze video comments manually. Deep Learning algorithm can obviously help with this, from speech/voice recognition to lip reading.
End of all, I still think that we always need manual work since researchers can explore the meaning of expressions from another perspective than machines and may find the political metaphors or hidden information.
I found this article very interesting, because it provided totally different approach to the analysis of YouTube videos, than what we have seen so far in the seminar in other papers. I find such quantitative analysis of a single video insightful, since it gives the researcher better understanding of what exactly is going on in the comments section. The downside of this method is that it cannot be used for processing a higher number of videos. Therefore, that's where the machine learning methods (such as the pipeline introduced in the seminar) could improve process and knowledge gained from various videos by increasing the dataset size. It helps to better understand general views and positions, so bunch of videos on the same topic could be analyzed by combining both quantitative and qualitive methods (for instance on topics such as politics and political movements, or more narrow: climate change).
Machine Learning methods could add to the insights from the paper, even though I find, that the authors did really well in substantially analysing the nature of the debate of the comments section of the "War on capitalism video" by implementing Dahlgrens framework of civic cultures. Nevertheless, with a set of limited categories, and therefore a predefined perspective, conclusions are restricted to the chosen domain of research. A topic modelling approach could be of interest, to find categories that were not pre-thought and might lead to irritation and further insight. This approach would also lead to a more efficient analysis process because it saves time. Furthermore, it would be really interesting to analyze platform-specific characteristics of the comment section and to analyze cross-references. How did users navigate on the platform and how did they reach this specific video?
The authors of this paper Julie Uldam and Tina Askanius explored comments on a particular Youtube video with the title “ War on Capitalism”. They did this exploration with the aim to analyze the political engagement and mobilization Youtube as a platform has. Performing analysis by hand on the comments the researches were able to show better results as if the analysis were performed with an ML pipeline. At this point I agree with what most of my fellow students opinion. An ML model would not show better results but I think in this case a manual research was possible as the scope of exploration was limited only on the comments on one particular video with a manageable amount of comments. But I think to make a statements about the whole platform ML could help to gather insights on comments of a larger number of videos. One could compare the insights collected on the larger sample with the results of manual analysis.
Uldam and Askanius (2013) are exploring ways in which a comment section can be understood as a civic space and commenting as “a mode of political engagement” (p. 1186). The methodology the researchers adopted is similar to the one we adopted in this course as it is focused on analyzing YouTube comments. However, in contrast to the methods we adopted so far categories relating to the content of the comment and the socio-demographic dimension of the commenter are evaluated mainly. In total, they analyzed a little less than 400 comments posted over two channels. In regard to evaluating comments on both content-related and socio-demographic dimensions, their dataset could be used as a baseline classifier, which is then applied to larger datasets for classification. A gain would be to have more than 400 comments as a basis. Finding more videos aligning with their narrow frame of reference for video selection could pose a challenge, though.
This study focuses on the YouTube video "War on Capitalism" and how comments act as a way of political engagement. They used 391 comments from the video and analyzed them using grounded theory. With that they could analyze who the audience is and what the content of the comments is. Since the amount of comments is rather small and they only focused on one video, their approach is really precise, but in my opinion it maybe would've been better to not focus only on one video, but rather analyze more videos focusing on the same topic. And with that ML methods come into play. As we read last week, results of grounded theory and ML algorithms are comparable and can give similar insights. By analyzing a bigger data set (thus more comments) they could also achieve good results using ML algorithms like clustering. It is not as precise as by doing it by hand but larger studies can give a deeper insight on how people politically engage in the comment section. In addition, algorithmic analysis offers many more options for visualization of the data and results.
The paper by Uldam and Askanius (2013) analyses the comments regarding the YouTube video “War on Capitalism” with respect to visibility and reach of the comments as well as by categorizing them along four other dimensions (values, affinity and identity, knowledge, and, dialogue and abusive commenting). The researchers manually analysed 391 comments. And this is exactly where ML could come into play. Firstly the number of comments seems quite low if we want to capture the political debate regarding a global issue. ML is the perfect tool to scale the main steps applied by the authors and therefore increase the significance of the study. The main steps being: creating statistics of the demographics and clustering comments with respect to categories. Secondly I feel like by doing those steps manually it happens that single comments are being attributed too much weight. It could well be that some commenter took less time and thought to write their comment than the authors dedicated to its analysis. As mentioned in the paper, many of the commenters do not engage in a debate and rarely respond to other comments. Therefore I believe that for these rare cases where an actual thread forms a manual analysis is to be preferred.
The paper is a case study on the YouTube video "War on Capitalism", which called for protest against the 15th United Nations Climate Change Conference, for which they analyzed text comments under this video. This analysis includes for example the origin of the commenters (mainly Denmark), but mainly their political affiliation/believes. Several statistics are extracted from the comment section, such as the correlation between the denunciation of the enemy and the attitude towards damage in protests (which I found especially interesting). It is also generally concluded, that YouTube doesn't promote a natural discussion as it would occur under more "natural" circumstances, since the use of online spaces can "pose threats to their safety". Two groups that frequently interact in the comments are also identified:
I could not find the video featured in this paper, and think that the number of comments is pretty low (391 across two channels), but I also think that for an insight this complex (political believe, intention), manual review does at the moment provide the best results.
The paper by Uldam and Askanius was very interesting. The investigation of the data set (containing 391 comments) resulted in findings regarding the distribution of same-minded groups, their agonistic enemies and YouTube’s potential to provide a communicative space for a dialogue and discussion. The comments were manually reviewed, which is only possible for a data set of limited size. To perform the same or a similar analysis on much more commented video, ML algorithms need to be used for the similarity clustering of the comments and the topic generation. The categories for the two dimension (categories related to comment posters and categories related to the content of comments) were also predefined by the authors. Here, the category creation could possibly be refined by neural network technologies, that automatically identify the most significant features. This could furthermore lead to new insights about correlations of features within the comment space.
The paper by Uldam and Askanius presents a typical example of conducting a qualitative content study in social science. Expect from the possibilities mentioned (unsupervised methods such as clustering & topic modeling) by the fellow students above. I would suggest applying supervised learning methods, which would expand their research scope and remain their analytical approach at the same time. The drawback of such a comment classifier would be the huge work load to annotate the data (>1k for ML and >10k for DL) and the ML method's explainability.
In the research paper "Online Civic Cultures: Debating Climate Change Activism on YouTube" by Uldam and Askanius, the researchers want to get an insight into video activism on YouTube. Therefor they chose a video called "War on Capitalism" which was a reaction to the COP15 conference in Copenhagen. They then conducted an analysis on the comments of this video.
Unfortunatly they do not describe their methodology in depth, but it can be assumed that they used some kind of computer-assiseted automation, because they mention a coding frame using two dimensions: data about users and semantic data from the comments. They also mention their use of the Dahlgren's civic cultures framework but it is unclear what exactly that consists of.
Looking at their results it seems they first divided the comments into different categories an then analized how many comments in each categorie saw their enemy as 1) not legitimate 2) unclear or 3) legitimate.
It is unclear to me if they used ML to detect the categorie of a comment and its view on the enemy. If they did not ML could be an approach accerlerate their analysis by using semanticaly clustering the comments, assigning categories and maybe clustering them again into 3 groups according to their view on the enemy.
1 Reading assignment
Read Paper: Uldam, Julie. “Online Civic Cultures: Debating Climate Change Activism on YouTube.” International Journal of Communication 7 (2013): 1185–1204. (available on Whiteboard: CSMA / Resources / 7)
Evaluate in a commentary of 150 words whether and how ML methods could to add to the insights from the paper above.
Submit on Github (reply to issue) until 16 Dec 12h00 (noon)
2 Seminar project preparation
Continue to work on your group project idea on Discord and add elements to the GSheet: https://docs.google.com/spreadsheets/d/1DdkST3KZV4x9D5nGsHgevIASmu_rFkK0Bx2r4AeBGPE/edit?usp=sharing
If you haven’t found a topic or group yet, you can a) either join an existing group, b) propose a new topic and ask others to join or c) pick one of the example topic indicated by instructors. Use the GSheet in these decisions.
Be aware that a first short paper about the research project (concept) has to be submitted until 7 January 2021. In order to prevent work during the winter break, you should already start to work for this assignment. Let us know if you have questions in this regard.