Week 8 - Possible Readings

Thinking-with-Deep-Learning-Spring-2022 / Readings-Responses

You can post your reading responses in this repository.

0 stars 0 forks source link

Week 8 - Possible Readings #14

Open lkcao opened 2 years ago

lkcao commented 2 years ago

Post a link for a "possibility" reading of your own on the topic of Reinforcement Learning [for week 8], accompanied by a 300-400 word reflection that: 1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus), 2) suggests how its method could be used to extend social science analysis, 3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation.

borlasekn commented 2 years ago

Lindström, B., Bellander, M., Schultner, D.T., Chang, A., Tobler, P.N., & Amodio, D.M., (2021). A computational reward learning account of social media engagement. Nature Communications. 12(1311). https://doi.org/10.1038/s41467-020-19607-x

Link: https://www.nature.com/articles/s41467-020-19607-x

1.While social media has become a central method for communication and interaction in modern life, it has also become an arena that is seemingly facilitated by rewards, or like, on posts. However, despite this apparent action/reward system, there is limited empirical evidence for this system truly existing. This work applies computational methods based on reinforcement learning theory to show that human behavior on social media conforms both qualitatively and quantitatively to the ideas surrounding reward learning. Specifically, they base their models on free-operant behavior in non-human animals. Users want to maximize engagement with their posts while thinking about the costs of posting as well as the cost of inaction (not engaging with other people's posts).

Being able to use RL to study how humans act on social media has many implications in social science. First, the idea that we can simulate experiences based on learned human behavior means that researchers can look at human behavior proactively (i.e. learning from models about social media likes on one post and then learning how people would respond to different types of posts). This is in contrast to actually exposing people to posts. This newfound understanding of humans viewing social media as a reward-based environment also provides insights into the motivations behind posting, and perhaps can provide insight into the ways and reasons that things go "viral" on social media. Finally, this work allows for an increased understanding in the evolution of human communication and action. Was human interaction always reward based, but now on social media we can model it a lot easier?
I think that this method could be used to determine what types of posts people like for companies to post. In this study, they used fashion and gardening posts, for example. I wonder how people respond for fashion, but with different genders or races of models. This could be applied by scraping posts that clothing companies make on social media. I would likely use a combination of social media posts on a given topic. Then, I would simulate more data using reinforcement methods to study the rewards/likes on these posts. This would allow us to look at how companies are leveraging the reward systems within social media and how who people see as desirable/fashionable drives who companies choose to post or choose to partner with/have endorsement deals with.

isaduan commented 2 years ago

The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning

Link: https://www.science.org/doi/10.1126/sciadv.abk2607

Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. This paper trains social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. It proposes a two-level deep reinforcement learning approach to learn dynamic tax policies, based on economic simulations in which both agents and a government learn and adapt. This data-driven approach does not make use of economic modeling assumptions and learns from observational data alone. The authors show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies, including the prominent Saez tax framework.
RL agents could be used to simulate social behavior, without too many presupposed rules of how exactly humans behave. This is super powerful since it allows complex behavior to emerge. I think the original model and environment can be tweaked to study to many things, e.g. how does different social welfare rewards function that the government adopts change the optimal policy one can obtain? how do economic agents adjust to changes in government policy? The model can also be re-purposed for policy searching in other contexts.
I could play with the simulation to examine: how does the size of the population (i.e. the number of workers in the economy) change the rate at which the government learns to design the optimal policy, and how other agents coordinate with each other? A challenge of using simulation would be to derive insights from simulations to the actual world. The paper uses actual data, but still one could not simulate the whole economy as it is ...

thaophuongtran commented 2 years ago

Hierarchical Reinforcement Learning for Open-Domain Dialog Authors: Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J., & Picard, R. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8741-8748. https://ojs.aaai.org//index.php/AAAI/article/view/6400

1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus),

The paper aims to improve open-domain dialog systems using hierarchical reinforcement learning. Open-domain or open-ended dialog is a system to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging instead of a more task-oriented approach. However, there are four main issues including bias toward malicious, aggressive, biased, or offensive responses, lack of the quality and sensitivity of the generated text, bias toward dull and repetitive text due to maximum likelihood estimation (MLE), and difficulty tracking long-term aspects of the conversation. The authors address these problems by developing Variational Hierarchical Reinforcement Learning (VHRL), which uses policy gradients to adjust the prior probability distribution of the latent variable learned at the utterance level of a hierarchical variational model. In addition, they utilize toxicity, psychology of good conversation, and other conversation metrics to prioritized human-centered rewards. Their evaluation process includes external interactive human evaluation platform, which showed that their solution outperformed other dialog architectures in human judgments of conversational quality.

2) suggests how its method could be used to extend social science analysis,

This approach of using VHRL to improve open-domain dialog can be applied widely in social science research and application such as improving users experience, customers communication, personalize suggestions, news recommendations, and even in gaming (training algorithm to win a game).

3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation.

As our project is related to meme classification and meme generation, I imagine this would be useful in meme generation and giving it rewards for good humor and punishment for toxicity (hatred, misogynistic memes) to avoid bias toward malicious, aggressive, biased, or offensive generated memes.

javad-e commented 2 years ago

Zheng, S. & Trott, A. & Srinivasa, S. & Parkes, D. & Socher, R. (2022). The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning. https://www.science.org/doi/pdf/10.1126/sciadv.abk2607?download=true.

In this paper, the researchers design an algorithm to solve the economists’ social planner problem. Many theoretical and empirical solutions have been proposed in the literature. However, they all have shortcomings due to the complexity of the problem and the insufficiency of historical data. The proposed algorithm is comprised of one deep reinforcement learning model at the individual level and another one at the level of the social planner. The algorithm begins with each actor learning about a new behavioral policy. It then iteratively uses its value function to decide on a new action. The authors use this model to implement an optimal tax policy. It should be noted that the optimization is happening at a macroeconomic level, so all individual-level interactions are not taken into account. The authors use simulated data of 100 agents. The simulation includes spatial factors, individual characteristics, and potential trading behaviors.

Macroeconomists often deal with optimization problems. Many of these problems can neither be solved theoretically because of their complexity nor econometrically because of the insufficiency of historical data. Taxation is just one example. One could think of similar problems in determining interest rates as another example. I believe the proposed framework could have many applications in solving such optimization problems in macroeconomics.

As described above, Zheng et al. (2022) only use simulated data in this study. It is a common practice for macroeconomists to test their models using such simulated data. There are two reasons for this. First, unlike real-world data points, such datasets are free of any irrational behavior and additional complexity. Second, simulation is just easier than accessing and cleaning macro data. However, as the authors themselves explain, the next step of the analysis would be testing the model with real-world data.

egemenpamukcu commented 2 years ago

Title: Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning Authors: Michael Bradley Johanson, Edward Hughes, Finbarr Timbers, and Joel Z. Leibo Link: https://arxiv.org/pdf/2205.06760.pdf

Paper sets up an artificial environment with 10 agents to see if they can replicate the behaviors of agents in a market economy. There are 5 banana and 5 apple farmers as agents. The "map" these agents are placed in is called the Fruit Market which has randomly placed apple and banana trees that agents can harvest. They spend resources to harvest and are specialized (apple farmers are better at harvesting apples but prefer bananas over apples). They use a deep RL model to simulate the agent's optimal behavior and come to the conclusion that their behavior very closely replicates the Econ 101 theoretical assumptions. They start trading with each other. They raise prices when resources are scarce or reduce prices when they are threatened by competition. Some agents even learn the niche of transportation and learn to just transport these items and become merchants instead of producers when resources are heterogeneously distributed across the map.
The possibilities are virtually limitless. Extending the Fruit Market set-up and constants the authors used by introducing more goods to be produced/traded, or adding more barriers to trade, we could experiment with a range of social and economic phenomena and their impact on individuals. One problem of social science research is that we cannot always manipulate the environment and subjects to see what happens when we mess with a certain variable. RL enables us to do exactly that but without the high costs or ethical considerations. For instance, I would be interested in knowing how voting behaviors change when potential rewards offered by politicians for their constituents change, and how this behavior is affected by constituents' personal cognitive, and behavioral traits.
One advantage of reinforcement learning is we don't need data! But we still need to set the parameters of our agents, their actions, and our environment. These parameters would need to be informed by the literature on behavioral politics and psychology in general.

yhchou0904 commented 2 years ago

Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics

The paper conducted a survey over those research that use deep learning and deep reinforcement learning methods to solve economic problems and also their applications. The reason why scholars choose to apply deep reinforcement learning to economic problems is that the methods could solve high-dimensional problems in conjunction with the complex pattern of economic data. For example, the writers found that deep learning methods could be applied to several fields, including stock pricing, auction mechanisms, and online markets. The architecture of DRL could especially detect optimal strategy and follow the constraints ideally. In addition, using reinforcement learning could help us to scale the model that helps us to make decisions.
In the paper, the writers especially talked about the application in stock trading, portfolio management, and online services. Generally, the optimization of a portfolio is a complicated task while trading stock in the market. The scholars have used models such as deep deterministic policy gradient and DQN to understand the dynamic stock market and try to learn some efficient stock portfolio strategies. With the help of DRL, we can improve the performance and consider crucial market and risk constraints at the same time. Also, the recommendation system is a general problem that needs to be solved in almost all online services, especially for those advertisers. With deep reinforcement learning, not only we can optimize advertisers’ gain, but also we can deal with the dynamic market environment when setting an efficient reward function.
According to the paper, we can make use of different kinds of complicated economic data within the deep reinforcement learning framework. To make the task clear, we would need to form explicit target objective functions and constraints for the agents.

Hongkai040 commented 2 years ago

Paper:Segregation dynamics with reinforcement learning and agent based modeling https://www.nature.com/articles/s41598-020-68447-8

For short, this paper combined reinforcement learning and agent based modeling to study residential segregation. The classic Schelling segregation model is the pioneer of this area, which argues that (even mild)personal preferences of living with people of same type could lead to residential segregation. Their study was inspired by the reward and interaction rules of Schelling model. The advantage of the model in this paper is that it promotes the creation of interdependencies and interactions among multiple agents of two different kinds that segregate from each other. They found that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones.

I think the combination of reinforcement learning and agent based model could provides new blood to traditional social science areas, especially the simulation of social patterns. We can use multi-agent reinforcement learning to represents various persons and reverse-engineering how people’s behaviors in real word are motivated. In other words, we can design several learning rewards and punishments according to our hypothesis, then implement the model according to the rules we set. We can check the results obtained in our model and real word data to see if our hypothesis works. This could boost the study of human behaviors at a large scale, especially in those scenarios where people need to interact with others. Further, we can even use the verified model to propose several directions for policy making. We can observe behavior changing of agents to see the influence of policies. So that in real word we can get a sense of prior experience of the influence of policies on targeted people in real word(this could be a bit controversial though).

I would probably use the combination of Agent based modeling and reinforcement learning to study opinion polarization on social media platforms. On some platforms, people post and others could choose to upvote or downvote the post/positively response or negatively response to the post. And on platforms like tweet you can even retweet the post. Some previous studies have indicated that small groups may exist and opinions from those group become more distant to each other. I would like to test both the effect of homophily and social reward/punishment. First I will try to explore wether the preference of homophily help the formation of groups/sub-netowrk using the model, then I will check the similarity of posts on that platform to verify results obtained by the model. Then I will also use downvotes, upvotes as a social reward for my agents to see how they’re motivated to different types of posts according to the period they’re in. For example, if they’re involved in a small group, will they post more posts that are similar to the style of that group? What will be the difference if they’re not affiliated with small groups? I think it can help answer how we behave in the era of internet.

zihe-yan commented 2 years ago

Wang, Y., Yang, W., Ma, F., Xu, J., Zhong, B., Deng, Q., & Gao, J. (2020, April). Weak supervision for fake news detection via reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 516-523).

Solving the problem of unreliable labeled samples has always been a troubling part of fake news detection. In this article, the authors proposed a weak-supervision framework in fake news detection: the WeFEND. This framework is divided into the annotator, the reinforced selector, and the fake news detector. Firstly, founding on the user's report, the annotator will assign weak labels to data. Then they use the reinforcement learning process to pick high-quality samples. Finally, the labeled data are passed into the fake news detector. In their experiment, the automatically annotated model outperforms other models in every category except for the recall of discerning real news, ranging from 75% to 88%.
This is a paper applying the method of reinforcement learning into the context of social media. They use the report data as weak labels for predicting the larger set of news. Similarly, this method can be applied to other social media that rely on community policing. The unreliability of the community policing method makes these media a natural fit for experimenting with such a model. And I do wonder how may the different landscapes of different social media platforms may cause the predicting results to vary.
Although, clearly, similar data in this study is hard to find. I would think that sentiment analysis, the other classification task that's popular in social media analysis, can be conducted in a similar way. For example, we may not be able to get the user's report, but we can get access to the comments for a set of articles on social media. Different sentiments in the comments can be seen as weak labels and then learned by the following process.

min-tae1 commented 2 years ago

Tampuu, Ardi, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Jaan Aru, Raul Vicente, and Juhan Aru. “Multiagent Cooperation and Competition with Deep Reinforcement Learning.” PLoS ONE 12, no. 4 (April 5, 2017): 1–15. doi:10.1371/journal.pone.0172395.

This paper employs Deep Q-Learning framework in a situation involving multiple individuals to find out when they compete or cooperate with each other. Using the video game Pong, the researchers analyze how and when competitive or cooperative behavior emerges and how shifting incentives could shift competition to cooperation. The paper also demonstrates that playing against other autonomous agents led to more diverse strategies compared to hard-wired algorithms.
Encouraging the cooperation of autonomous individuals is one of the key questions in social sciences. While Adam Smith argued that competition between individuals would maximize the utility of society, the collective action problem, exemplified in the tragedy of the commons and the prisoner’s dilemma, illustrates that it is imperative for a fully functioning society to foster cooperation in one way or another. While scholars such as Elinor Ostrom strived to find solutions to these problems, critics have pointed out the difficulties of applying those resolutions en masse. Deep reinforcement learning might offer a solution to this conundrum. By studying when and how autonomous entities decide to cooperate, social scientists could now understand the conditions that generate cooperation between rational individuals. Moreover, creating complex games that resemble specific social situations could also offer a more direct resolution to social circumstances. For instance, the lemons problem is a situation where cooperation is required to maximize the utility of society. Employing deep reinforcement learning in this situation may offer policy advice on how to create a market where good cars could be sold at a reasonable price.
Elinor Ostrom’s book Governing the Commons exemplifies numerous cases where societies avoided the tragedy of the commons. Creating a game that resembles the cases, and eliminating certain conditions one by one could teach us which condition plays the most important role in reinforcing cooperation.

linhui1020 commented 2 years ago

Ciranka, S., Linde-Domingo, J., Padezhki, I., Wicharz, C., Wu, C. M., & Spitzer, B. (2022). Asymmetric reinforcement learning facilitates human inference of transitive relations. Nature Human Behaviour, 6(4), 555-564.

1) briefly summarizes the article (e.g., as we do with the first “possibility” reading each week in the syllabus) Transitive inference are inference the relationship between two subjects based on the existing relations between two subjects. This study attempts to infer the novel relationships based on an asymmetric learning policy, which differs from a simple reinforcement learning mechanism. The authors find that observers update their belief about the winner (or loser), asymmetric, in a pair in their four experiments, which validates the asymmetric learning policy. In addition, when simulation, they find that value estimates for winning and losing relationship are the same under the full feedback while under the partial feedback the asymmetric model only updates one side works best. Therefore, human learners could learn more strategically with only sparse feedback for transitive inference.

2) suggests how its method could be used to extend social science analysis if A likes the B and B likes C will A likes or dislikes C could help us to estimate such relationship by approach mentioned in this paper. Social preference in social media and our real life is important for us to reveal more information about homophily. Especially in a case we only have partial network or partial sample data. With this method，we either can complement the network based on inference or testing the existing theory about relationship.

3) describes what social data you would use to pilot such a use with enough detail that someone could move forward with implementation In our project，we use urban economic data and satellite data. Is that possible we estimate one area is more gentrified than another and so on to get a complete scale of the degree of gentrification instead of binary classification. In addition, is that possible we include such relations as a feature of observation and input this into the prediction model?

sudhamshow commented 2 years ago

Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach

This work mainly focuses on using Reinforcement Learning for Active Learning on Multimodal data. Previous research focused on using RL for machine translation using multimodal data or uni/multi modal question-answering. Multimodal data is a treasure trove of personalised information, but there are 2 major roadblocks. Generic supervised classifiers don't work too well on predefined labels for different people. The other issue is the effort required/accuracy in labelling multi modal data. The authors overcome this with 2 novel solutions:

They propose a novel approach for multi-modal AL using RL for training a policy for active data-selection
They introduce a novel personalisation strategy based on the actively-selected multi-modal data of the target user To achieve this the authors building a Q-function using an LSTM module with 32 hidden nodes (followed by other activation functions). To obtain the engagement estimate by the proposed multi- modal approach, they perform the model-level fusion by combining the target predictions of the modality-specific LSTMs using a majority vote (of the ensemble of LSTM models for each data mode). The authors show that this multi-modal personalised classifier learnt based on training the Q-function performs better than all other pre-existing models in the real world (robot-autistic kid interaction)

Researchers have since long bemoaned about the effort it takes to label large amount of data efficiently and have resorted to human annotation. Naive active learning techniques like uncertainty sampling and entropy based methods haven't been very affective. This research has the potential to build customised unsupervised classifiers using limited amount of high quality data for Social science research. The application areas could range from HCI (personalised robots) to studying behaviour of people.

One potential area where this could be applied is to our final project where we are trying to study the behaviour of different U.S presidents on different platforms. We could run a pilot study using multimodal data collected from archives/other internet sources and train the Q-function to choose the best data for us to label, in turn building a personalised classifier for each of the presidents than simply having the same trained/pretrained-fine tuned model for all the presidents in the study

ValAlvernUChic commented 2 years ago

Martinez, A. D., Del Ser, J., Osaba, E., & Herrera, F. (2021). Adaptive multi-factorial evolutionary optimization for multi-task reinforcement learning. IEEE Transactions on Evolutionary Computation.

This paper presents A-MFEA-RL, an adaptive version of the well-known MFEA algorithm whose search and inheritance operators are crafted for multitask reinforcement learning environments. Multitask reinforcement learning is a process in which tasks, often with their own subtasks, share the same solution with said subtask. Identifying exactly what solutions are shared could speeden up the process. To do this, the authors' model aims to avoid negative transfers of information and "favor the exchange of knowledge between synergistic tasks. Specifically, they use an SBX operator which is designed to only transfer useful information. This is contingent on the design of a unified space of possible tasks.
Theories from this paper can be used to improve meme classification as it seeks to base performance on uncovering "underlying synergistic relationships". A big difficulty with memes is connecting the latent contextual decisions found in the meme images with the text. Additionally, as the space of possible texts for a meme is much much larger than if we were just to be interested in a literal caption of the image, the authors' method might be suitable to negotiate this space.
I would use a dataset of memes with their images, and text pre-vectorized to facilitate the encoding for the learning model. It'd be best if the dataset did not inhabit a particular theme so as to not bias meme identification to a particular type of meme.

Yaweili19 commented 2 years ago

Controlling the Risk of Conversational Search via Reinforcement Learning https://doi.org/10.1145/3442381.3449893

Users often formulate their search queries and questions with immature language without well-developed keywords and complete structures. Such queries are likely to fail to express their true information needs and raise ambiguity as fragmental language often yields various interpretations and aspects. In this work, we propose a risk-aware conversational search agent model to balance the risk of answering a user's queries and asking clarifying questions by implementing reinforcement learning.

The decision-maker uses a deep Q network (DQN) to decide between answering the query with the best answer and asking a clarification question with the best question. The DQN uses a BERT-based encoder to encode the initial query q, context history h, the top k clarifying question {cq1, ..., cqk }); and the CLS token vectors as their feature representations. Finally, it concatenates all the features and generates a 2 × 1 decision vector. Then it reads the reranking scores s1:kcq and s1:kans of the top k questions and answers from the reranker output.

The reinforcement learning model can be adapted to many social science research cases with diverse risks or costs for decision-makers. A unified reward and penalty measurement are required to make it work, but it could be simulated as the authors did with random sampling and user simulation. It can also be applied to classification problems where labeling data is expensive or difficult, as the reinforcement approach could take the user reaction as the output label.

Emily-fyeh commented 2 years ago

Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, J. B., & Heess, N. (2018). Woulda, coulda, shoulda: Counterfactually-guided policy search. arXiv preprint arXiv:1811.06272.

Reinforcement learning algorithms can simulate from the logged, real experience and model alternative outcomes under counterfactual actions, preventing model biases derived from the complex environments. This paper proposes the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. (i.e. policies that were not actually adopted.) CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. The authors find empirically that the advantages of CF-GPS can translate into improved policy evaluation and search.

This paper applied a causal framework to the reinforcement model design. The authors connect concepts from reinforcement learning and causal inference, showing that counterfactual reasoning in structural causal models on off-policy data can facilitate solving non-trivial reinforcement learning tasks. They also re-interpreted two previous algorithms as alternative counterfactual methods. CF-GPS assumes that there are no additional hidden confounders in the environment, which could wrongly attribute a negative outcome to the agent’s actions, instead of environmental factors.

I think the CF-GPS can be extended to further policy fields, such as the realm of international policy, where nation-states are assumed as principal actors. Then the interaction among countries and the decision-making of the international organization could possibly be simulated and the effects of the counterfactuals can be evaluated.

mdvadillo commented 2 years ago

Reinforcement Learning for Mean Field Games, with Applications to Economics Andrea Angiuli, Jean-Pierre Fouque, Mathieu Lauriere Link to Paper

This paper uses a reinforced learning algorithm as a solution to mean field games and mean field control problems. A mean field game is similar to a regular game as defined in economic game theory, with the exception that it has an infinite number of participants. Mean Field Games have Nash Equilibria. A Mean Field Control Problem, on the other hand, is a Mean Field Game that does not focus on finding Nash Equilibria, but rather social optima arising in a cooperative setting. Equilibria or social optima in these games can be characterized through forward-backward systems of partial differential equations (PDE) or stochastic differential equations. In this paper, the authors study two settings in which they assume agents do not know the model, making Reinforced Learning a great candidate for solving the problem. They present a two timescale approach, relying on a unified Q-learning algorithm to solve the problem. Their main contribution is that they were able to update an action-value function and a distribution without requiring a model, at two different learning rates. The two applications are: accumulated consumption with infinite horizon and HARA utility function, and a trader’s optimal liquidation problem.
Reinforced learning can be applied to many areas of Economics, especially as we move into non-parametric estimation to make the models (and results) more realistic. Furthermore, notice that the two applications in this paper are rooted in game theory examples. I think we can apply this technique to other areas of economics, especially to test things like mechanism design.
For example, say we want to study a dating/marriage market, and we have data on participant’s characteristics, and some data on some matches between participants that worked and some that didn’t -- for example, a data from a dating app on right vs left swipes, demographic characteristics of members of the site, and we can tell when participants meet up for a date and go into a successful relationship. There are are some algorithms in place that use preferences (explicitly) revealed by participants (or clearly inferred by the data) to create stable matches (e.g. Deferred Acceptance Algorithm aka DAA). These stable matches guarantee that no one will move and mess up the resulting state, but there are other states that could have been achieved that would have made every agent better off. I propose that RL can find the distribution of preferences among the population, and thus find a better way to match participants: treat the dating market like a game, where there exists a Nash equilibria, and solve it as a Mean Field Game or a Mean Field Control Problem.

BaotongZh commented 2 years ago

Influence maximization in unknown social networks: Learning Policies for Effective Graph Sampling Link: https://arxiv.org/abs/1907.11625

In this paper, the author proposes a reinforcement learning framework to discover effective network sampling heuristics by leveraging automatically learned node and graph representations that encode important structural properties of the network. At training time, the method identifies portions of the network such that the nodes selected from this sampled subgraph can effectively influence nodes in the complete network. The output of this training is a transferable, adaptive policy that identifies an effective sequence of nodes to query on unseen graphs.

The method introduced by this paper could be extended to other graph discovery problems by altering the reward function. The graph embeddings learned identified pick nodes with high betweenness centrality with respect to the entire network, which was key to discovering important portions of the network. A detailed investigation into complex patterns learned by the model could reveal insights into the structural properties of social networks pertaining to the identification of influential nodes.

In particular, the approach leverages the availability of historical network data from similar populations (for instance, when deciding which youths to survey for an HIV prevention intervention, we can deploy policies learned from training using networks gathered at other centers). As a consequence, the author's agent learns more nuanced policies which can both be fine-tuned better to a particular distribution of graphs and which can adapt more precisely over the course of the surveying process itself (since the trained policy is a function of a graph discovered so far). Even if training networks are not available from the specific domain of interest, we show that comparable performance can also be obtained by training on standard social network datasets or datasets generated synthetically using community structure information and transferring the learned policy unchanged to the target setting.

yujing-syj commented 2 years ago

Efficient Object Detection in Large Images Using Deep Reinforcement Learning

Burak Uzkent, Christopher Yeh, Stefano Ermon; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1824-1833

In this paper, the authors aim to solve the problem that purchasing high spatial resolution images is expensive in some application domains such as remote sensing. To reduce the large computational and monetary cost associated with using high spatial resolution images, they propose a reinforcement learning agent that adaptively selects the spatial resolution of each image that is provided to the detector. In particular, they train the agent in a dual reward setting to choose low spatial resolution images to be run through a coarse level detector when the image is dominated by large objects, and high spatial resolution images to be run through a fine level detector when it is dominated by small objects. This reduces the dependency on high spatial resolution images for building a robust detector and increases run-time efficiency. After performing experiments on the xView dataset, consisting of large images, they increase runtime efficiency by 50% while maintaining similar accuracy as a detector that uses only high resolution images.

For this reinforcement model, it is designed to solve the problem that the agent in a dual reward setting could automatically detect the image dominated by large objects or small objects, and choose different spatial resolution. This could be very useful for our final project since we have the two different kinds of data under one label. If we feed the same model, the accuracy of prediction could be very low since one group of data could be treated as noise. But if we use the reinforcement learning that helps the model to distinguish among those two groups under one label, the model then extracts different features from each group and synthesizes the model. This could increase the accuracy of our model.

We are focusing on the satellite image in our final project. So the potential data could be the image data of non-gentrified data. There are two kinds of non-gentrified data. One is image of low-income area where could be gentrified in one day. Other is image of high-income area where will never be gentrified. Reinforcement learning is very suitable to deal with the potential problem if the model cannot distinguish these two types.

pranathiiyer commented 2 years ago

Luceri, L., Giordano, S., & Ferrara, E. (2020, May). Detecting troll behavior via inverse reinforcement learning: A case study of russian trolls in the 2016 us election. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 417-427).

This paper specifically uses an Inverse reinforcement learning approach to detect troll accounts online during the 2016 US Presidential Elections. The authors employ IRL to infer a set of online incentives that may steer user behavior, which in turn highlights behavioral differences between troll and non-troll accounts, enabling their accurate classification. They consider troll accounts identified by the US Congress during the investigation of Russian meddling in the 2016 US Presidential elec-tion and accurately detect troll accounts with an AUC of 89.1%. The differences in the predictive features between the two classes of accounts enables a principled understanding of the distinctive behaviors reflecting the incentives trolls and non-trolls respond to.
This approach can be used to identify accounts that post memes online, and once identified, these memes can be classified using deep learning approaches. Moreover, memes can also be classified using such a reinforcement learning approach where users are given different incentives to click on meme/non-meme posts, and this enables classification of memes based on the click rate os users to some extent.
My group and I are trying to identify what makes a meme a meme, and a user's click rate or memes' different abilities to incentivise users to engage with content online is definitely one approach that we could explore as a feature for our classification problem.

chentian418 commented 2 years ago

Modern Perspectives on Reinforcement Learning in Finance https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3449401

This paper talks about how reinforcement learning (RL) – machine learning models that enable agents to learn to make a sequence of decisions through “trial and error” incorporating feedback from its actions and experiences – can be used to solve modern financial applications of intertemporal choice. In finance, common problems of this kind include pricing and hedging of contingent claims, investment and portfolio allocation, buying and selling a portfolio of securities subject to transaction costs, market making, asset liability management and optimization of tax consequences, to name a few.
dynamic Programming can be used to solve policy functions for consumption and savings in an intertemporal consumption-savings problem. By setting the utility function and using methods like value function iteration or endogenous grid point method, we can get the optimal consumption and savings function and maximum value function.
After getting the policy functions, we can simulate the intertemporal consumption-savings problem and evaluate the wealth inequality, the consumption and saving path in the whole economy. Also, we can evaluate changes of the mean wealth in the economy with different interest rates.

ShiyangLai commented 2 years ago

Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist Link: https://arxiv.org/pdf/2108.02904.pdf

The paper introduced a Data-Driven, Interpretable, and Robust Policy Design framework, which is called AI Economist. The authors also provide a case study on optimizing the stringency of US state polices and Federal subsidies during a pandemic to validate the effectiveness of their framework. They find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes. Their behavior can be explained, e.g., well-performing policies respond strongly to changes in recovery and vaccination rates. They are also robust to calibration errors, e.g., infection rates that are over or underestimated. As of yet, real-world policymaking has not seen adoption of machine learning methods at large, including RL and AI-driven simulations. The results show the potential of AI to guide policy design and improve social welfare amidst the complexity of the real world.
The framework enables us to involve two level agents into the modeling process. They can have different objectives, different action sets, and these really enable us to work on more complicate social-economic agents interaction questions. Besides, the toolkit they provided can also benefit us interpreting the simulation results and conducting sensitivity analysis.
Personally, I am interested in the survival dynamics of agents in the sharing economy markets. One example is Airbnb short-term accommodation market. Public data of sharing and non-sharing agents can be obtained from insideairbnb. Using this data, we can do environment calibration. The objective function for sharing agents might be a combination of income level and willingness of sharing.

y8script commented 2 years ago

Reinforcement Learning, Fast and Slow https://www.sciencedirect.com/science/article/pii/S1364661319300610?via%3Dihub

This article provides the perspective from cognitive scientists on how think about current reinforcement learning on modeling and simulating human learning behaviors. The major drawback of current modeling is that deep reinforcement learning is 'slow', which means that it requires a large amount of training to finally acquire the proper strategy. Unlike deep RL models, human can efficiently learn through much less trial-errors. A fundamental differences between deep reinforcement learning and human learning underlies this difference in speed. The author calls for fast and efficient training of deep RL models and points out some current progresses of this approach on episodic memory and meta-learning.
From psychology perspective, I am interested in using deep RL models as an innovation of understanding human learning. From engineering perspective, as human could learn much more efficiently than current deep RL models, it probably indicates that there are still much potential in improving the efficacy of RL training. Thus, how to make deep RL models faster is one interesting direction for me.
For specific examples, I'm interested in implementing new structures in deep RL models that are somewhat aware of the consistency and meaningfulness of each exploration options, rather than randomly selecting options during exploration. We can use human behavioral data as the training data to tell deep learning models what are the 'reasonable' guesses by a separate training process. With this 'monitor of choice' network at the higher level, we may build more efficient deep RL models.