FUB-HCC / seminar_critical-social-media-analysis

Creative Commons Zero v1.0 Universal
6 stars 7 forks source link

Quantitative and qualitative analysis of user behaviour based on YouTube videos dealing with climate change #32

Open adrigru opened 3 years ago

adrigru commented 3 years ago

Group members:

Quantitative and qualitative analysis of user behaviour based on YouTube videos dealing with climate change

User behaviour analysis based on social network data is an extensive research field usually affected by noise and misguiding elements. Interactions within the platforms might count with the presence of trolls, bot accounts, human bot, or entities, which for various reasons, try to deviate the sentiment of the ongoing discussions.

Our project will focus on user activity and discussions within YouTube and its use as a social network. We will try to solve the following questions regarding bot or user classification:

To do so, we will leverage two different approaches for qualitative and quantitative analysis to successfully classify behavioural patterns on social media interactions, enabling us to answer the aforementioned questions.

Our analysis will focus on climate change, particularly the spark of Friday for future protests. These public interest topics can be considered controversial due to the political discussions on the international scene and are usually affected by the presence of misguiding entities on user interactions in social media [1], serving as a valuable material to facilitate our research.

Our starting dataset will be composed of the top n videos and interactions retrieved using YouTube public API containing keywords on our topic of interest. Our selection criteria will consider the language, the number of views, user interactions, likes and dislikes, as well as video descriptions, content creator, among other video and text metadata that we can process to provide valuable information and carry out our analysis.

YouTube’s sorting algorithm will enable us to compare the dynamics of the discussions and provide us different perspectives on the interpretation of the results.

Let us summarise our analysis framework in the following steps:

We will start with the pre-processing and cleaning routines on the raw dataset removing unnecessary formatting, links, and symbols. Afterward, we will leverage the pipeline provided in the seminar to create word embeddings for each comment. Depending on the computational effort, we will decide whether we want to increase or decrease the analysed videos and user interactions. Furthermore, we will perform cluster analysis using k-means to see if the resulting clusters comprise individuals that write similar or the same comments under different videos. We will graphically illustrate these results using visualisation techniques.

We will perform a thorough qualitative analysis to refine our observations and identify undetected behaviours during our quantitative analysis phase. Specifically, we will analyse if users, or groups of users, post the same or remarkably similar comments under various or in the same video. We can identify such behaviour with an analysis of the users within each cluster.

Our qualitative analysis might be based on the media practices suggested in Jones, R. H., Chik, A. and Hafner, C. A. (2015) [2]. Here, the authors state that it is necessary to formulate new analytical frameworks to study the social practices associated with the digitally mediated discourse. To perform discourse analysis, the authors consider the use of four elements:

Our analysis framework will iterate over the pre-processing, the qualitative, and quantitative analysis steps, refining the dataset, the strategy, and the results, providing us enough insights to determine to which degree individuals or groups of users control the discussion in the comment section.

Our results will showcase a working framework for bots or internet trolls' identification on the climate change topic, providing us with valuable information and patterns to remove noise in public discussion forums.

We should recognise how the misleading comments and control over social media discussions affect the public discourse and generate extreme postures to relevant political topics. Therefore, elevating the importance of these kinds of analysis, detecting manipulations from foreign or malicious actors.

Our framework will leverage concepts reviewed during the semester lectures and will use popular analysis tools in natural language processing. We will execute it in the span of six weeks with a resulting report revealing the confidence of our classification and our academic foundation.

References: [1] https://www.theguardian.com/technology/2020/feb/21/climate-tweets-twitter-bots-analysis [2] Jones, R. H., Chik, A. and Hafner, C. A. (2015) Discourse analysis and digital practices. In: Discourse and Digital Practices: Doing discourse analysis in the digital age. Routledge, London, pp. 117. ISBN 9781138022331 Available at http://centaur.reading.ac.uk/66501/