UChicago-Thinking-Deep-Learning-Course / Readings-Responses

1 stars 0 forks source link

Week 9 - Possibility Readings #15

Open bhargavvader opened 3 years ago

bhargavvader commented 3 years ago

Post a reading of your own that uses deep learning for social science analysis and understanding, with a focus on Solving Problems & Creating Digital Doubles - in this case, we want you to look for examples with multi-modal data, joint models, joint embeddings, and examples where deep learning was leveraged to solve or explain large scale, complex problems.

Yilun0221 commented 3 years ago

Title: Analysis of Twitter Users’ Lifestyle Choices using Joint Embedding Model

Summary: In this study, the researchers studied how “Multiview representation learning” can help better understand corpus on social media. They combined the people’s social information (including user network, user location and user description where the latter two datasets consist of “Metadata”) and their comments on Twitter and applied a joint embedding model. Instead of general Tweet corpus, Drs. Islam and Goldwasser aimed to analyze people’s “activity type and motivation” and classify the users only using tweets about “Yoga” and “Keto diet” and their users. The architecture of their model is “multiview representation neural network based fusion”. The researchers created metadata representation and user network representation separately using people’s social information, based on which user representation is created containing information of the tweet texts. They trained, validated, and tested the deep neural network model using different joint representations and also fine-tuning pre-trained BERT, where the joint embeddings who achieved accuracies around 80% outperform the fine-tuning pre-trained BERT. Drs. Islam and Goldwasser also aims to use this methodology to explore “community detection based on lifestyle decisions” beyond this paper in the future.

Expansions to social science analysis: Inspired by the researchers comments about community detection, I think this methodology can be used to study social stratification and class mobility. For example, it can be used to study how the opinions of different social classes influence the ones of the other social classes, where we can explore the power of discourse of different social classes. Furthermore, with well-tuned representations, a speaker’s social information be predicted in turn.

New dataset exploration: We can also try corpus from other social media about other topics, like people’s comments about a societal event or a policy. In addition, people’s comments towards an article or an editorial from a newspaper or a media can also be explored.

nwrim commented 3 years ago

Lu et al. (2020). 12-in-1: Multi-Task Vision and Language Representation Learning, CVPR 2020, pp. 10437-10446

  1. Brief summary of the article This article suggests a multi-task model for vision-and-language tasks based on ViLBERT (the paper that introduced this model is suggested as a possibility reading for this week). The focus of this paper is that there are a variety of tasks that involves both vision and language and models are often fine-tuned to these individual tasks. This is not too satisfying because the underlying association between vision and languages should be generalizable on most tasks (the example they give is that a labeling task which labels "small red vase" to an image should be able to answer "what color is the small vase?"). Building on this, they train their model on 12 datasets with 4 categories of tasks (question answering, referring expressions, multi-modal verification, caption-based image retrieval) jointly using a pipeline they introduced (this has a dynamic stop-and-go training scheduler, task-dependent input tokens, and simple hyper-parameter heuristics). They find that their model actually performs better in many tasks than models that are fine-tuned only to specific tasks in single-task performance. They also show that this training scheme results in effective pretraining, and (naturally) has much fewer parameters than having 12 different models tuned each tuned to 12 tasks.
  2. Suggestion on how its method could be used to extend social science analysis Any model that performs better than other models in vision-and-language tasks will be beneficial to social science research in general. For example, a model that can automatically caption what is happening in an image will be greatly helpful if we want to inquire about some specific action on a large scale. Also, I think that a model that learns across multiple tasks actually performs better than model trained only on specific tasks has a great insight within. This is closer to how a human learns, and this might be capturing the vision-language relationship in a more "fundamental" way (whatever this means).
  3. Describing what social data you would use to pilot such a use Image captioning tasks could be used in social interaction photos to detect some relationships. For example, maybe we can detect how "violent" an interaction is and come up with interesting social science research about violence (perception). As for the insight I mentioned, I think comparing if the model that was trained on multiple tasks is more closer to humans in general, but I think this might be entering the domain of science fiction as of now.
Raychanan commented 3 years ago

Title Joint Embeddings of Chinese Words, Characters, and Fine-grainedSubcharacter Components.

Summary In this work, authors propose an ap-proach to jointly embed Chinese words aswell as their characters and fine-grainedsubcharacter components. They use threelikelihoods to evaluate whether the con-text words, characters, and componentscan predict the current target word, andcollected 13,253 subcharacter componentsto demonstrate the existing approaches ofdecomposing Chinese characters are notenough. Evaluation on both word similar-ity and word analogy tasks demonstratesthe superior performance of their model.

Expansions to social science analysis This technique is suitable for most text analysis in Chinese. Methodically, morphological elements embedded in Chinese characters are considered as a way to improve Chinese embeddings. Thus, they highlight the nature or characteristics of pictograms, which is very clever and interesting. Therefore, this joint embedding method should be considered by those who analyze Chinese text.

Dataset This technique should be applied to many datasets. As for my course final project, it would be worthwhile to apply this method to my study of sentiment changes on Weibo during COVID-19.

cytwill commented 3 years ago

Title: A Multi-label Multimodal Deep Learning Framework for Imbalanced Data Classification

Summary: In this research, the authors proposed a new framework for multi-label classification using multimodal data. They used a dataset designed for natural disaster information retrieval and management, which comprises data in form of video, audio, and text. The authors first generated static embeddings for these different data using pre-trained models (Video: Inception-V3; Audio: SoundNet; Text: Glove) and temporal features for the data sequences using residual bidirectional LSTM network. After gaining these embeddings, the authors concatenate those unimodal embeddings to form single embeddings for all the samples. The concatenated embeddings are then put into a modified random forest classifier, which puts more penalty for misclassification of data points from the minor group. The multi-label classification task was transformed to a single-label task using the Label Powerset algorithm. Also, the researchers implemented feature selection by dropping the lowest-ranked feature in each iteration and comparing the evaluation score with the previous score to find the best subsets of concatenated features.

The model performance was evaluated via multi-label classification metrics including micro-F1, Hamming loss, and micro average precision. Their proposed model outperforms other single and dual models or models that have no feature selection, in all three metrics.

Extension to Social Research: From my perspective, the most contribution of this work to social science research is how they introduced an effective way to do multi-label classification using multi-modal embeddings. The multi-label scenario is very common in real-life data as each entity could have different attributes from different perspectives. For example, a customer could be interested in both electronic products and science fiction. This research suggests that multi-modal features might be helpful to better generate digital doubles or better describe digitized social entities. The feature selection method takes the advantage of the understandability in the tree-based method and would be useful for other classification tasks and for answering social questions like who impacts whom.

New dataset exploration: For my final project in this course, we would also include a multi-label task (predicting the GitHub Topic tags). The idea used in this paper might be helpful to handle our task. One thing we need to do first is to filter the topic tags to make more effective and informative labels for the repositories. The Label Powerset algorithms might not be a good choice at the first step since the number of unique topic tags is too many to build the powerset. Some tags that are seldom used need to be dropped and those having similar semantics might need to be viewed as one category.

william-wei-zhu commented 3 years ago

Title: Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response

Summary: this research project combines text and image data from twitter to identify live disaster using convolutional neural network. Results show that combining text and image data from twitter yields higher accuracy than twitter text data by itself.

Social science extension: other data sources like video, audio may also be useful in improving identification accuracy. Besides spotting patterns of disaster, these data sources may be used to predict other concurrent events, like stock prices trends.

New data: other data sources on twitter and reddit.

hesongrun commented 3 years ago

Title: Deep learning for finance: deep portfolios

Summary: The authors explore the use of multi-modal deep learning hierarchical models for problems in financial prediction and classification. Financial prediction problems – such as those presented in designing and pricing securities, constructing portfolios, and risk management – often involve large data sets with complex data interactions that currently are difficult or impossible to specify in a full economic model. Applying deep learning methods to these problems can produce more useful results than standard methods in finance. In particular, deep learning can detect and exploit interactions in the data that are, at least currently, invisible to any existing financial economic theory.

Social Science Extensions: With the availability of more diverse sets of data, the approach developed by the authors can definitely be used to digest a wide array of multi-modal representations of underlying company performance. For example, we have audio data such as the recordings of the company earnings calls, or image data such as satellite image of company production sites. Last but not least, we have textual data such as news texts as well as announcements. Combining all these different representations can definitely provide much insight into the future performance of the company.

New Data: As mentioned above, we have visual, vocal, and verbal data for measuring the company's performance.

jsoll1 commented 3 years ago

Title: Cooperative Multimodal Approach to Depression Detection in Twitter

Summary: Well, the title is definitely accurate since this IS a cooperative multimodal approach to depression detection in twitter. The idea is that tweets contain both text and images so an accurate model must be able to join the two. This paper takes a look at a new kind of multi-agent reinforcement learning model. This uses two policy gradient agents, one for text and one for images, and evaluates the joint actions. The problem is that generally a global optimizer finds only global rewards, which isn't ideal. So this paper takes a look at a COMMA (cooperative misoperation multi-agent) type of policy gradient. This decentralizes the model, with two actors learning from two critics and the reward is based off both local and global conditions. This performs well on the depression benchmark, so it's an exciting new technique.

Social Science Extension: This generally seems strong in any area which needs both image and text data. I can imagine it being useful to gauge the severity of a disaster when the tweets start coming in, since that is a similar (twitter) dataset with both images and text. This could be useful for media and relevant authorities.

Data set: Twitter after a disaster

k-partha commented 3 years ago

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Summary: Compositional generalization is a key challenge in grounding natural language to visual perception. Recent studies have shown that deep learning model for multimodal tasks like visual question answering often fail to generalize to new inputs that are simply an unseen combination of those seen in the training distribution. This paper proposes to tackle this challenge by employing neural factor graphs for tighter coupling between different modalities (e.g. images and text). The model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions. They achieve state-of-the-art results on the CLOSURE dataset, improving the mean overall accuracy across seven compositional templates by 4.77%.

Social Science extension: This approach would be highly relevant to many social scientific datasets containing multimodal data. This approach would embed these multimodal entities within a graph and then use the resulting composition for deep-learning analysis of the problem. I can see this being possibly applicable to my own dataset for Twitter personalities as I have text and image data that are closely associated with one another.

Dataset: Twitter personalities (MBTI) dataset - (tweets + images and bios + profile background pictures)

bakerwho commented 3 years ago

Multimodal Deep Learning Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011, January). In ICML.

Summary: This paper is iconic because it comes before the 2012 onset of the deep learning revolution. The authors use deep networks tweaked to learn from multi-modal features that do better on a host of tasks, such as audiovisual ones. They also show the phenomenon of cross modality feature learning, where better features for one modality such as video can be learned if by training and learning features on multi-modal data (such as both audio and video). The paper boasts the best published visual speech classification on the AVLetters dataset, with impressive performance for 2011.

Social Science extension: I think the idea of this paper was really gripping and is something that is intuitively pleasing. Human cognition is also multimodal, and in many tasks like driving, we rely heavily not just on vision and tactile sensing, but even touch and smell. It would be interesting to investigate the effect of people's material conditions on how they react to new information in different social settings. How do the same people behave in different social environments? The internet (different social accounts, often serving different social purposes for the same person), is a wealth of data of this type.

Dataset: I'm not sure if an easily available dataset exists for this type of application. I'm sure it would be possible to build a scraper that compiles the publicly available profiles of a selected subset of people on different websites like LinkedIn, Facebook, blogs, websites and more. What could we do with such multimodal data? Can we infer the differences between the zones or 'modal regions' of each sample of behavior?

pcuppernull commented 3 years ago

DeepAD: A Joint Embedding Approach for Anomaly Detection on Attributed Networks

Summary: This paper proposes a method of incorporating both network data and node-specific data for anatomy detection. The authors introduce DeepAD, which uses both nodal attributes and the topological structure of a network to create joint embeddings by stacking layers in a graph convolutional network (GCN). In effect, the authors use an autoencoder network structure and detect anomalies by assessing the errors in reconstruction of nodes within the network. Such an approach allows the authors to capture highly nonlinear relationships in their data and experimental results show that DeepAD outperforms competitor methods for anomaly detection.

Social Science Extension: There are a variety of anomaly detection applications for DeepAD within the social sciences. In effect, this method could be applied to any scenario where the researcher has both network and node-specific data. We could think of various uses for social media data, or even settings like the study of democracy, where we have a networks of representatives who each have known attributes.

Data set: DeepAD could be used in the field of international security studies. Various scholars are concerned with “force posturing”, which speaks to the idea of how militaries are geographically positioned and equipped. A researcher could develop a dataset of state-level navy force posturing, where the network is taken as the geographical positions of ships in the navy and the node-specific information covers data regarding the ship, the type of weapons it carries, etc. This data could be observed at the daily level and used with DeepAD to understand which periods of times exhibited “anomalous” force postures. Such a measure would allows researchers to better understand the reasons that prompt government to take unusual action with their militaries.