Open lkcao opened 2 years ago
Lindström, B., Bellander, M., Schultner, D.T., Chang, A., Tobler, P.N., & Amodio, D.M., (2021). A computational reward learning account of social media engagement. Nature Communications. 12(1311). https://doi.org/10.1038/s41467-020-19607-x
Link: https://www.nature.com/articles/s41467-020-19607-x
1.While social media has become a central method for communication and interaction in modern life, it has also become an arena that is seemingly facilitated by rewards, or like, on posts. However, despite this apparent action/reward system, there is limited empirical evidence for this system truly existing. This work applies computational methods based on reinforcement learning theory to show that human behavior on social media conforms both qualitatively and quantitatively to the ideas surrounding reward learning. Specifically, they base their models on free-operant behavior in non-human animals. Users want to maximize engagement with their posts while thinking about the costs of posting as well as the cost of inaction (not engaging with other people's posts).
The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning
Link: https://www.science.org/doi/10.1126/sciadv.abk2607
Hierarchical Reinforcement Learning for Open-Domain Dialog Authors: Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J., & Picard, R. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8741-8748. https://ojs.aaai.org//index.php/AAAI/article/view/6400
1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus),
The paper aims to improve open-domain dialog systems using hierarchical reinforcement learning. Open-domain or open-ended dialog is a system to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging instead of a more task-oriented approach. However, there are four main issues including bias toward malicious, aggressive, biased, or offensive responses, lack of the quality and sensitivity of the generated text, bias toward dull and repetitive text due to maximum likelihood estimation (MLE), and difficulty tracking long-term aspects of the conversation. The authors address these problems by developing Variational Hierarchical Reinforcement Learning (VHRL), which uses policy gradients to adjust the prior probability distribution of the latent variable learned at the utterance level of a hierarchical variational model. In addition, they utilize toxicity, psychology of good conversation, and other conversation metrics to prioritized human-centered rewards. Their evaluation process includes external interactive human evaluation platform, which showed that their solution outperformed other dialog architectures in human judgments of conversational quality.
2) suggests how its method could be used to extend social science analysis,
This approach of using VHRL to improve open-domain dialog can be applied widely in social science research and application such as improving users experience, customers communication, personalize suggestions, news recommendations, and even in gaming (training algorithm to win a game).
3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation.
As our project is related to meme classification and meme generation, I imagine this would be useful in meme generation and giving it rewards for good humor and punishment for toxicity (hatred, misogynistic memes) to avoid bias toward malicious, aggressive, biased, or offensive generated memes.
Zheng, S. & Trott, A. & Srinivasa, S. & Parkes, D. & Socher, R. (2022). The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning. https://www.science.org/doi/pdf/10.1126/sciadv.abk2607?download=true.
In this paper, the researchers design an algorithm to solve the economists’ social planner problem. Many theoretical and empirical solutions have been proposed in the literature. However, they all have shortcomings due to the complexity of the problem and the insufficiency of historical data. The proposed algorithm is comprised of one deep reinforcement learning model at the individual level and another one at the level of the social planner. The algorithm begins with each actor learning about a new behavioral policy. It then iteratively uses its value function to decide on a new action. The authors use this model to implement an optimal tax policy. It should be noted that the optimization is happening at a macroeconomic level, so all individual-level interactions are not taken into account. The authors use simulated data of 100 agents. The simulation includes spatial factors, individual characteristics, and potential trading behaviors.
Macroeconomists often deal with optimization problems. Many of these problems can neither be solved theoretically because of their complexity nor econometrically because of the insufficiency of historical data. Taxation is just one example. One could think of similar problems in determining interest rates as another example. I believe the proposed framework could have many applications in solving such optimization problems in macroeconomics.
As described above, Zheng et al. (2022) only use simulated data in this study. It is a common practice for macroeconomists to test their models using such simulated data. There are two reasons for this. First, unlike real-world data points, such datasets are free of any irrational behavior and additional complexity. Second, simulation is just easier than accessing and cleaning macro data. However, as the authors themselves explain, the next step of the analysis would be testing the model with real-world data.
Title: Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning Authors: Michael Bradley Johanson, Edward Hughes, Finbarr Timbers, and Joel Z. Leibo Link: https://arxiv.org/pdf/2205.06760.pdf
Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics
Paper:Segregation dynamics with reinforcement learning and agent based modeling https://www.nature.com/articles/s41598-020-68447-8
For short, this paper combined reinforcement learning and agent based modeling to study residential segregation. The classic Schelling segregation model is the pioneer of this area, which argues that (even mild)personal preferences of living with people of same type could lead to residential segregation. Their study was inspired by the reward and interaction rules of Schelling model. The advantage of the model in this paper is that it promotes the creation of interdependencies and interactions among multiple agents of two different kinds that segregate from each other. They found that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones.
I think the combination of reinforcement learning and agent based model could provides new blood to traditional social science areas, especially the simulation of social patterns. We can use multi-agent reinforcement learning to represents various persons and reverse-engineering how people’s behaviors in real word are motivated. In other words, we can design several learning rewards and punishments according to our hypothesis, then implement the model according to the rules we set. We can check the results obtained in our model and real word data to see if our hypothesis works. This could boost the study of human behaviors at a large scale, especially in those scenarios where people need to interact with others. Further, we can even use the verified model to propose several directions for policy making. We can observe behavior changing of agents to see the influence of policies. So that in real word we can get a sense of prior experience of the influence of policies on targeted people in real word(this could be a bit controversial though).
I would probably use the combination of Agent based modeling and reinforcement learning to study opinion polarization on social media platforms. On some platforms, people post and others could choose to upvote or downvote the post/positively response or negatively response to the post. And on platforms like tweet you can even retweet the post. Some previous studies have indicated that small groups may exist and opinions from those group become more distant to each other. I would like to test both the effect of homophily and social reward/punishment. First I will try to explore wether the preference of homophily help the formation of groups/sub-netowrk using the model, then I will check the similarity of posts on that platform to verify results obtained by the model. Then I will also use downvotes, upvotes as a social reward for my agents to see how they’re motivated to different types of posts according to the period they’re in. For example, if they’re involved in a small group, will they post more posts that are similar to the style of that group? What will be the difference if they’re not affiliated with small groups? I think it can help answer how we behave in the era of internet.
Wang, Y., Yang, W., Ma, F., Xu, J., Zhong, B., Deng, Q., & Gao, J. (2020, April). Weak supervision for fake news detection via reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 516-523).
Solving the problem of unreliable labeled samples has always been a troubling part of fake news detection. In this article, the authors proposed a weak-supervision framework in fake news detection: the WeFEND. This framework is divided into the annotator, the reinforced selector, and the fake news detector. Firstly, founding on the user's report, the annotator will assign weak labels to data. Then they use the reinforcement learning process to pick high-quality samples. Finally, the labeled data are passed into the fake news detector. In their experiment, the automatically annotated model outperforms other models in every category except for the recall of discerning real news, ranging from 75% to 88%.
This is a paper applying the method of reinforcement learning into the context of social media. They use the report data as weak labels for predicting the larger set of news. Similarly, this method can be applied to other social media that rely on community policing. The unreliability of the community policing method makes these media a natural fit for experimenting with such a model. And I do wonder how may the different landscapes of different social media platforms may cause the predicting results to vary.
Although, clearly, similar data in this study is hard to find. I would think that sentiment analysis, the other classification task that's popular in social media analysis, can be conducted in a similar way. For example, we may not be able to get the user's report, but we can get access to the comments for a set of articles on social media. Different sentiments in the comments can be seen as weak labels and then learned by the following process.
Tampuu, Ardi, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Jaan Aru, Raul Vicente, and Juhan Aru. “Multiagent Cooperation and Competition with Deep Reinforcement Learning.” PLoS ONE 12, no. 4 (April 5, 2017): 1–15. doi:10.1371/journal.pone.0172395.
This paper employs Deep Q-Learning framework in a situation involving multiple individuals to find out when they compete or cooperate with each other. Using the video game Pong, the researchers analyze how and when competitive or cooperative behavior emerges and how shifting incentives could shift competition to cooperation. The paper also demonstrates that playing against other autonomous agents led to more diverse strategies compared to hard-wired algorithms.
Encouraging the cooperation of autonomous individuals is one of the key questions in social sciences. While Adam Smith argued that competition between individuals would maximize the utility of society, the collective action problem, exemplified in the tragedy of the commons and the prisoner’s dilemma, illustrates that it is imperative for a fully functioning society to foster cooperation in one way or another. While scholars such as Elinor Ostrom strived to find solutions to these problems, critics have pointed out the difficulties of applying those resolutions en masse. Deep reinforcement learning might offer a solution to this conundrum. By studying when and how autonomous entities decide to cooperate, social scientists could now understand the conditions that generate cooperation between rational individuals. Moreover, creating complex games that resemble specific social situations could also offer a more direct resolution to social circumstances. For instance, the lemons problem is a situation where cooperation is required to maximize the utility of society. Employing deep reinforcement learning in this situation may offer policy advice on how to create a market where good cars could be sold at a reasonable price.
Elinor Ostrom’s book Governing the Commons exemplifies numerous cases where societies avoided the tragedy of the commons. Creating a game that resembles the cases, and eliminating certain conditions one by one could teach us which condition plays the most important role in reinforcing cooperation.
Ciranka, S., Linde-Domingo, J., Padezhki, I., Wicharz, C., Wu, C. M., & Spitzer, B. (2022). Asymmetric reinforcement learning facilitates human inference of transitive relations. Nature Human Behaviour, 6(4), 555-564.
1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus) Transitive inference are inference the relationship between two subjects based on the existing relations between two subjects. This study attempts to infer the novel relationships based on an asymmetric learning policy, which differs from a simple reinforcement learning mechanism. The authors find that observers update their belief about the winner (or loser), asymmetric, in a pair in their four experiments, which validates the asymmetric learning policy. In addition, when simulation, they find that value estimates for winning and losing relationship are the same under the full feedback while under the partial feedback the asymmetric model only updates one side works best. Therefore, human learners could learn more strategically with only sparse feedback for transitive inference.
2) suggests how its method could be used to extend social science analysis if A likes the B and B likes C will A likes or dislikes C could help us to estimate such relationship by approach mentioned in this paper. Social preference in social media and our real life is important for us to reveal more information about homophily. Especially in a case we only have partial network or partial sample data. With this method,we either can complement the network based on inference or testing the existing theory about relationship.
3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation In our project,we use urban economic data and satellite data. Is that possible we estimate one area is more gentrified than another and so on to get a complete scale of the degree of gentrification instead of binary classification. In addition, is that possible we include such relations as a feature of observation and input this into the prediction model?
Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach
This work mainly focuses on using Reinforcement Learning for Active Learning on Multimodal data. Previous research focused on using RL for machine translation using multimodal data or uni/multi modal question-answering. Multimodal data is a treasure trove of personalised information, but there are 2 major roadblocks. Generic supervised classifiers don't work too well on predefined labels for different people. The other issue is the effort required/accuracy in labelling multi modal data. The authors overcome this with 2 novel solutions:
Researchers have since long bemoaned about the effort it takes to label large amount of data efficiently and have resorted to human annotation. Naive active learning techniques like uncertainty sampling and entropy based methods haven't been very affective. This research has the potential to build customised unsupervised classifiers using limited amount of high quality data for Social science research. The application areas could range from HCI (personalised robots) to studying behaviour of people.
One potential area where this could be applied is to our final project where we are trying to study the behaviour of different U.S presidents on different platforms. We could run a pilot study using multimodal data collected from archives/other internet sources and train the Q-function to choose the best data for us to label, in turn building a personalised classifier for each of the presidents than simply having the same trained/pretrained-fine tuned model for all the presidents in the study
Martinez, A. D., Del Ser, J., Osaba, E., & Herrera, F. (2021). Adaptive multi-factorial evolutionary optimization for multi-task reinforcement learning. IEEE Transactions on Evolutionary Computation.
Controlling the Risk of Conversational Search via Reinforcement Learning https://doi.org/10.1145/3442381.3449893
Users often formulate their search queries and questions with immature language without well-developed keywords and complete structures. Such queries are likely to fail to express their true information needs and raise ambiguity as fragmental language often yields various interpretations and aspects. In this work, we propose a risk-aware conversational search agent model to balance the risk of answering a user's queries and asking clarifying questions by implementing reinforcement learning.
The decision-maker uses a deep Q network (DQN) to decide between answering the query with the best answer and asking a clarification question with the best question. The DQN uses a BERT-based encoder to encode the initial query q, context history h, the top k clarifying question {cq1, ..., cqk }); and the CLS token vectors as their feature representations. Finally, it concatenates all the features and generates a 2 × 1 decision vector. Then it reads the reranking scores s1:kcq and s1:kans of the top k questions and answers from the reranker output.
The reinforcement learning model can be adapted to many social science research cases with diverse risks or costs for decision-makers. A unified reward and penalty measurement are required to make it work, but it could be simulated as the authors did with random sampling and user simulation. It can also be applied to classification problems where labeling data is expensive or difficult, as the reinforcement approach could take the user reaction as the output label.
Reinforcement learning algorithms can simulate from the logged, real experience and model alternative outcomes under counterfactual actions, preventing model biases derived from the complex environments. This paper proposes the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. (i.e. policies that were not actually adopted.) CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. The authors find empirically that the advantages of CF-GPS can translate into improved policy evaluation and search.
This paper applied a causal framework to the reinforcement model design. The authors connect concepts from reinforcement learning and causal inference, showing that counterfactual reasoning in structural causal models on off-policy data can facilitate solving non-trivial reinforcement learning tasks. They also re-interpreted two previous algorithms as alternative counterfactual methods. CF-GPS assumes that there are no additional hidden confounders in the environment, which could wrongly attribute a negative outcome to the agent’s actions, instead of environmental factors.
I think the CF-GPS can be extended to further policy fields, such as the realm of international policy, where nation-states are assumed as principal actors. Then the interaction among countries and the decision-making of the international organization could possibly be simulated and the effects of the counterfactuals can be evaluated.
Reinforcement Learning for Mean Field Games, with Applications to Economics Andrea Angiuli, Jean-Pierre Fouque, Mathieu Lauriere Link to Paper
Influence maximization in unknown social networks: Learning Policies for Effective Graph Sampling Link: https://arxiv.org/abs/1907.11625
In this paper, the author proposes a reinforcement learning framework to discover effective network sampling heuristics by leveraging automatically learned node and graph representations that encode important structural properties of the network. At training time, the method identifies portions of the network such that the nodes selected from this sampled subgraph can effectively influence nodes in the complete network. The output of this training is a transferable, adaptive policy that identifies an effective sequence of nodes to query on unseen graphs.
The method introduced by this paper could be extended to other graph discovery problems by altering the reward function. The graph embeddings learned identified pick nodes with high betweenness centrality with respect to the entire network, which was key to discovering important portions of the network. A detailed investigation into complex patterns learned by the model could reveal insights into the structural properties of social networks pertaining to the identification of influential nodes.
In particular, the approach leverages the availability of historical network data from similar populations (for instance, when deciding which youths to survey for an HIV prevention intervention, we can deploy policies learned from training using networks gathered at other centers). As a consequence, the author's agent learns more nuanced policies which can both be fine-tuned better to a particular distribution of graphs and which can adapt more precisely over the course of the surveying process itself (since the trained policy is a function of a graph discovered so far). Even if training networks are not available from the specific domain of interest, we show that comparable performance can also be obtained by training on standard social network datasets or datasets generated synthetically using community structure information and transferring the learned policy unchanged to the target setting.
Efficient Object Detection in Large Images Using Deep Reinforcement Learning
Burak Uzkent, Christopher Yeh, Stefano Ermon; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1824-1833
In this paper, the authors aim to solve the problem that purchasing high spatial resolution images is expensive in some application domains such as remote sensing. To reduce the large computational and monetary cost associated with using high spatial resolution images, they propose a reinforcement learning agent that adaptively selects the spatial resolution of each image that is provided to the detector. In particular, they train the agent in a dual reward setting to choose low spatial resolution images to be run through a coarse level detector when the image is dominated by large objects, and high spatial resolution images to be run through a fine level detector when it is dominated by small objects. This reduces the dependency on high spatial resolution images for building a robust detector and increases run-time efficiency. After performing experiments on the xView dataset, consisting of large images, they increase runtime efficiency by 50% while maintaining similar accuracy as a detector that uses only high resolution images.
For this reinforcement model, it is designed to solve the problem that the agent in a dual reward setting could automatically detect the image dominated by large objects or small objects, and choose different spatial resolution. This could be very useful for our final project since we have the two different kinds of data under one label. If we feed the same model, the accuracy of prediction could be very low since one group of data could be treated as noise. But if we use the reinforcement learning that helps the model to distinguish among those two groups under one label, the model then extracts different features from each group and synthesizes the model. This could increase the accuracy of our model.
We are focusing on the satellite image in our final project. So the potential data could be the image data of non-gentrified data. There are two kinds of non-gentrified data. One is image of low-income area where could be gentrified in one day. Other is image of high-income area where will never be gentrified. Reinforcement learning is very suitable to deal with the potential problem if the model cannot distinguish these two types.
Luceri, L., Giordano, S., & Ferrara, E. (2020, May). Detecting troll behavior via inverse reinforcement learning: A case study of russian trolls in the 2016 us election. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 417-427).
Modern Perspectives on Reinforcement Learning in Finance https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3449401
Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist Link: https://arxiv.org/pdf/2108.02904.pdf
Reinforcement Learning, Fast and Slow https://www.sciencedirect.com/science/article/pii/S1364661319300610?via%3Dihub
Post a link for a "possibility" reading of your own on the topic of Reinforcement Learning [for week 8], accompanied by a 300-400 word reflection that: 1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus), 2) suggests how its method could be used to extend social science analysis, 3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation.