Thinking-with-Deep-Learning-Spring-2024 / Readings-Responses

You can post your reading responses in this repository.
0 stars 1 forks source link

Week 9. May. 17: Reinforcement Learning - Possibilities #18

Open JunsolKim opened 2 months ago

JunsolKim commented 2 months ago

Pose a question about one of the following articles:

Human-level control through deep reinforcement learning” 2015. V. Mnih...D. Hassabis. Nature 518: 529–533 “Learning ‘What-if’ Explanations for Sequential Decision-Making” (2021). “Improved protein structure prediction using potentials from deep learning” (2020). “Machine Theory of Mind” (2018) “Explainability in deep reinforcement learning” (2021)

kceeyang commented 4 weeks ago

300-400 word reflection:

For this week, we are studying reinforcement learning and other related topics. I think the reading “Inverse Reinforcement Learning without Reinforcement Learning” gave a good overview of the limitations and benefits of inverse reinforcement learning and also a great discussion on its potential for various applications. In this paper, the authors first explained that inverse reinforcement learning is the problem of inferring a reward function that makes optimal demonstrated behavior. Then, they introduced the advantages and challenges of using standard inverse reinforcement learning methods. The three major benefits, policy space structuring, transfer across problems, and robustness to compounding errors, make the inverse reinforcement learning approach provide state-of-the-art results on imitation problems. However, the trade-off of the robustness to compounding error, for example, requires repeatedly solving a reinforcement learning problem in their inner loop. In this case, the approach makes each inner loop iteration inefficient since it reduces the easier problem of imitation to the repeated solving of the harder problem of reinforcement learning.

Hence, the authors proposed two algorithms aimed at speeding up the reinforcement learning subroutine with expert resets. Both the MMDP (Moment Matching by Dynamic Programming), for learning a sequence of policies, and NRMM (No- Regret Moment Matching), for learning a single stationary policy, reset the learner to states from the expert demonstration to reduce unnecessary exploration in parts the expert never visits. In their experiment, the authors showed that these expert resets approaches are significantly more sample efficient than traditional inverse reinforcement learning methods on problems where exploration is challenging, such as in the AntMaze environments, and in situations where exploration is less of a burden, such as for the PyBullet tasks.

It would probably be interesting to see how such sample-efficient inverse reinforcement learning methods are applied to some social navigation tasks. For example, I am curious whether they can be used to learn socially compliant navigation policies more efficiently from the observed human trajectories or not.

maddiehealy commented 4 weeks ago

On the Challenges of Deep Reinforcement Learning

This week I wanted to explore more of the background/impact of reinforcement learning, so I read this article on The Societal Implications of Deep Reinforcement Learning. This article offers a nice overview of how Deep Reinforcement Learning (DRL) will impact different sectors of society as it continues to emerge as a prominent field. The authors approach DRL from the perspective of AI research, emphasizing DRL's real-world applications in areas including robotics, finance, and healthcare.

One of the highlighted points of DRL is how it affords a higher level of autonomy to AI systems without explicit programming. This is made possible through reinforcement learning's trial-and-error learning process as well as deep learning's ability to manage high-dimensional data. Overall, these characteristics of RL and DL come together in DRL to perform actions that maximize cumulative rewards over time.

However, the authors note how there is a disparity between DRL's success in controlled environments versus real-world deployment. The rest of the paper dives into the challenges DRL faces in the following areas: (1) human oversight, (2) safety and reliability, (3) reward function design, (4) data collection incentives, (5) security, and (6) the future of work. Briefly, these challenges include the difficulty of maintaining human oversight due to rapid decision-making, ensuring safe exploration to prevent errors, designing accurate and ethical reward functions, addressing privacy concerns from extensive data collection, securing DRL systems from cybersecurity attacks, and managing the potential acceleration of job automation we will see with a higher deployment of DRL.

I selected this paper because I felt that as social science researchers, it is important to gain further background on the ethical and societal implications of the models we are handling. With this in mind, we can integrate DRL into social science research as it has promising avenues for understanding human behavior, policy-making, and societal governance.

XueweiLi1027 commented 3 weeks ago

This week, I recommend a paper studying political opinion with reinforcement learning. The paper, How Politicians Learn from Citizens' Feedback: The Case of Gender on Twitter, is published on one of the top political science journals AJPS.

Reflection The article explores how politicians adapt their issue focus based on feedback received from citizens on social media platforms, specifically Twitter. Utilizing a reinforcement learning framework, the study posits that politicians adjust their attention to various issues in response to positive feedback. Through an analysis of 1.5 million tweets by Spanish MPs, the research identifies a gendered difference in feedback, where female politicians receive more positive reactions when discussing gender issues. This dynamic is shown to contribute to a specialization in gender issues among female politicians. The study also delves into mechanisms behind this phenomenon, suggesting that the differential feedback is not due to the content of the tweets but rather the gender of the politician and the congruity between gender and issue focus in the eyes of the public.

The reinforcement learning model applied in this study can extend social science analysis by providing a framework for understanding how politicians' behaviors are influenced by real-time, direct feedback from constituents. This approach can be used to study polarization, the impact of social media on political discourse, and the dynamics of issue attention and agenda-setting by politicians. By applying text analysis and machine learning techniques to social media data, researchers can gain insights into how political communication evolves and how it reflects or influences public opinion. To pilot the use of this method, researchers could gather social media data from a range of politicians across different regions and political affiliations. For example, one might use Twitter data from members of the U.S. Congress, focusing on their tweets about various policy issues. Data collection would involve tweets, retweets, likes, and replies. Using natural language processing (NLP), similar to the BERT model employed in the study, tweets could be categorized by issue. Feedback could be quantified by engagement metrics like retweets and likes. This data would be used to estimate a model of reinforcement learning, assessing how politicians' issue focus shifts in response to feedback.

Question While the study provides a robust analysis of feedback mechanisms on Twitter, a critical question arises regarding the generalizability of these findings. How do these dynamics on social media translate to offline political behavior and policy-making? Additionally, it would be valuable to understand whether the reinforcement learning model holds true for politicians who may not be as active on social media or for those operating in political environments with different media consumption patterns.

guanhongliu2000 commented 3 weeks ago

I didn't find any valuable article on this week's topic. I read the article "Human-level control through deep reinforcement learning" by Volodymyr Mnih et al. This article represents a significant milestone in the field of artificial intelligence, specifically in reinforcement learning (RL) and deep learning. The authors introduce a novel artificial agent, the Deep Q-Network (DQN), which effectively combines reinforcement learning with deep neural networks to achieve human-level performance in various complex tasks, specifically classic Atari 2600 games.

A central theme of the article is the challenge of creating an agent that can derive efficient representations from high-dimensional sensory inputs and generalize its learning to new situations. This is a formidable task due to the complexity of real-world environments, where useful features cannot always be handcrafted, and states are often partially observable and high-dimensional. The authors successfully address this challenge by leveraging advances in deep neural networks, particularly deep convolutional networks, to process raw sensory data.

One of the notable achievements of the DQN is its ability to learn directly from pixel inputs and game scores, without any additional prior knowledge. This end-to-end learning approach sets it apart from previous methods that relied heavily on hand-designed features and low-dimensional state spaces. The DQN's ability to outperform previous algorithms and achieve a level of play comparable to professional human testers across a diverse set of 49 Atari games underscores its robustness and generalizability.

The stability of reinforcement learning with neural networks is a critical issue addressed by the authors. They introduce two key innovations to enhance stability: experience replay and the use of a separate target Q-network. Experience replay involves storing the agent's experiences and sampling them randomly to break the correlations in the observation sequence, which smooths the learning process. The separate target Q-network reduces correlations between the action-values and target values, preventing oscillations and divergence in the learning process.

The article also highlights the significance of the DQN's architecture, which includes multiple convolutional layers followed by fully connected layers, enabling the network to build hierarchical representations of the input data. This architecture draws inspiration from biological systems, particularly the hierarchical sensory processing observed in the visual cortex.

The results presented in the article are impressive. The DQN not only outperforms existing RL methods on the majority of the Atari games but also achieves human-level performance on several games. This performance is evaluated rigorously, with the DQN agent demonstrating its ability to learn effective strategies and even discover optimal long-term strategies in some games, such as "Breakout."

HongzhangXie commented 3 weeks ago

I am very impressed with the paper "Dynamic coupon targeting using batch deep reinforcement learning: An application to livestream shopping." Deep reinforcement learning (DRL) is a powerful tool for identifying optimal strategies. I have observed numerous studies utilizing DRL to address scientific challenges, such as protein prediction, autonomous driving, and medical strategies. However, research on using DRL to predict human behavior and to conduct high-quality live experiments to evaluate the effectiveness of DRL models seems less common.

In the study, the authors employ batch deep reinforcement learning (BDRL), which relies on Q-learning—a model-free solution that can mitigate model bias—to analyze optimal strategies for distributing coupons in a livestream shopping environment. The authors’ BDRL solution doubled the platform's revenue compared to static targeting strategies and showed a 20% increase over model-based solutions. They argue that BDRL allows for more effective and automated consumer targeting based on heterogeneity and dynamics. Subsequently, the research was validated with a "out-of-sample" live experiment on the platform, using 1,020,898 consumers to further test the new strategy's performance. The results indicate that the dynamic targeting strategy based on the model was 39% more effective than a random allocation strategy, and BDRL was 60% more effective than random allocation.

My question is, in deep reinforcement learning, we often find that the DRL model performs better in research data than in live experiments. This situation persists even when we have divided the data into training and testing sets. This could be due to systematic biases in the data (for example, if most horse images in the dataset contain copyright watermarks, the machine might use the watermark to recognize horses). How can we better avoid such potential "overfitting" issues?

uc-diamon commented 3 weeks ago

How can reinforcement agents be used to generalize their learning to other tasks and environments in order to reduce computation and training time?

CYL24 commented 3 weeks ago

I would like to recommend the article: Contrastive explanations for reinforcement learning in terms of expected consequences. This article addresses the challenge of understanding complex RL models, which is crucial for both trusting the results of the model and making correction and adjustments to the model. The authors propose a novel method for enhancing transparency in Reinforcement Learning models by providing contrastive explanations based on expected consequences of actions.

First, this method, unlike previous methods, translates states and actions into user-interpretable concepts to facilitate comprehension. The translation is achieved through a set of binary classifiers trained during the agent's exploratory learning process. Then the method simulates future states and actions using the transition function of the RL agent to gather information about expected state transitions and outcomes. Also, the method allows users to ask contrastive why questions and employs value functions to derive the foil policy from the user's question, enabling comparisons between different policies. The final explanations are generated based on simulated trajectories of state-action pairs under the learned policy and the foil policy, providing insights into why the agent behaves in a certain way.

I think this article might be useful in terms of understanding and "communicating" with the RL models for more ideal results in researches by providing a way to understanding the reasons behind the chosen actions.

erikaz1 commented 3 weeks ago

Regarding the Mnih et al. paper (2015), from my understanding, the discount factor in the Q-function acts on the reward for a specific policy. How is the rate of discount determined at each new point in time?

Pei0504 commented 3 weeks ago

The paper ' Human-level control through deep reinforcement learning' presents a novel artificial agent, the deep Q-network (DQN), which integrates deep neural networks with reinforcement learning to derive efficient representations from high-dimensional sensory inputs. Tested on Atari 2600 games, DQN surpassed previous algorithms and performed comparably to professional human testers. Key innovations include the use of experience replay and a separate target network to stabilize learning. This work marks a significant step towards general AI, capable of learning from raw pixels and game scores, bridging high-dimensional inputs and actions.

How does the deep Q-network (DQN) approach differ from traditional reinforcement learning methods, and what were its key contributions to the field?

Marugannwg commented 3 weeks ago

This paper -- “Machine Theory of MindLinks to an external site.” (2018) -- provides a mind-opening but intuitive angle towards how to interpret the agents. From my understanding, it is using reinforcement learning techniques to do psychology (instead of getting lost in the neuroscience of "black-box") of a system of agents. Just like we don't need to know our friends' brain to understand each other, we can use limited, observable information to infer (something?) about deep learning model behaviors. (The concern here is that this intuitive theory of understanding still seems like a black-box....)

From a more research-oriented perspective, the method of this paper suggests the feasibility of applying reinforcement learning to agent base modeling in studying of social systems. This makes me wonder if reinforcement learning can be used along with those agents simulated by LLMs.

anzhichen1999 commented 3 weeks ago

Xuechunzi Bai's paper: Globally Inaccurate Stereotypes Can Result From Locally Adaptive Exploration. https://pubmed.ncbi.nlm.nih.gov/35363094/

Considering the findings from Bai, Fiske, and Griffiths (2021) on the emergence of inaccurate stereotypes through locally adaptive exploration, how can reinforcement learning models be designed to mitigate the formation of such stereotypes in real-world applications? What role do methods like Thompson sampling, state augmentation, and attention mechanisms play in ensuring more accurate and unbiased learning outcomes, particularly in high-dimensional and socially complex environments?

kangyic commented 3 weeks ago

For XRL, is it possible to convert discrete time models to continuous time models, what are common use case of continuous time models in this context?

00ikaros commented 3 weeks ago

How does the theory of reinforcement learning provide a normative account of optimizing control over an environment, and what challenges do agents face when applying reinforcement learning to real-world complex situations? Specifically, how do agents derive efficient representations from high-dimensional sensory inputs, and how do humans and animals seemingly solve this problem through a combination of reinforcement learning and hierarchical sensory processing? Furthermore, what advancements have allowed the development of a deep Q-network (DQN) capable of learning successful policies directly from high-dimensional sensory inputs? In the context of testing on classic Atari 2600 games, how does the DQN achieve performance comparable to professional human testers, and what significance does this achievement have for the applicability of reinforcement learning in diverse and challenging domains?

HamsterradYC commented 3 weeks ago

I would like to recommend the paper Spatial planning of urban communities via deep reinforcement learning

This article discusses an approach to urban planning using a deep reinforcement learning (DRL) model, which optimizes the spatial layout of urban communities. This AI-driven model utilizes a graph-based representation to handle the diverse and complex geometries of urban settings, significantly enhancing planning efficiency compared to traditional methods. By integrating the "15-minute city" concept, the model improves access to essential services, promoting sustainability and reducing the dependence on vehicular transportation. The AI not only autonomously generates efficient urban plans but also collaborates with human planners, thereby speeding up the planning process and potentially improving the quality of urban life.

Social Science Analysis: The methodology presented in the article could revolutionize social science research by providing a framework to explore urban sociology dynamics under various planning scenarios. By simulating different urban planning strategies, researchers can study the impacts of these environments on social behaviors, community engagement, and economic activities. For instance, using DRL models, sociologists could predict how changes in urban design influence social interactions or how accessible amenities contribute to urban well-being.

Social Data: To pilot this application in a social science context, one could use the DRL model to study the impact of urban green spaces on community health outcomes. The data required would include geocoded health records, demographic information, and detailed maps of urban layouts with existing green spaces. Health records and demographic data could be sourced from public health databases or through partnerships with local hospitals, ensuring data privacy and security measures are strictly followed.

My question is how does the DRL model handle the spatial heterogeneity and temporal dynamics of urban environments, particularly in rapidly evolving urban areas, and what are the limitations of its current graph-based approach in capturing these complexities?

Carolineyx commented 3 weeks ago

The paper [Reinforcement Learning with Fast and Forgetful Memory] advances traditional memory models in reinforcement learning by introducing Fast and Forgetful Memory (FFM). This model improves efficiency and aligns closely with human cognitive processes, addressing challenges in partially observable environments common in real-world applications. FFM's strength lies in its logarithmic time complexity and linear space complexity, making it a better alternative to RNNs, which have slower training times and higher complexity. This efficiency is crucial for applications needing rapid learning and adaptability. The model's ability to achieve higher rewards without hyperparameter tuning highlights its robustness and generalizability.

FFM can significantly enhance social science research by improving the analysis of complex, dynamic social systems. Social interactions in online communities often involve partially observable environments where individuals' actions depend on their memory of past interactions. Using FFM, researchers can model these interactions more effectively, capturing underlying patterns and dynamics with greater accuracy.

To pilot the use of FFM in social science, data from social media platforms like Twitter or Reddit would be ideal. These platforms provide rich, dynamic datasets where user interactions and behaviors evolve over time. By applying FFM, researchers can track how past interactions influence future behaviors, identify emerging trends, and understand the impact of specific interventions or events. This approach could involve:

Data Collection: Gather longitudinal data on user interactions, posts, and engagement metrics. Preprocessing: Clean and preprocess the data to create a sequence of user activities and interactions. Model Implementation: Integrate FFM into an RL framework to model user behavior and predict future interactions. Analysis: Analyze the model's performance, focusing on its ability to capture and predict complex social dynamics.

Xtzj2333 commented 3 weeks ago

Zhou, Y., Han, S., Kang, P., Tobler, P. N., & Hein, G. (2024). The social transmission of empathy relies on observational reinforcement learning. Proceedings of the National Academy of Sciences, 121(9), e2313073121.

The authors used the Rescorla-Wagner model to understand how participants adjusted their expectations of empathy after observing and comparing their predictions to the actual empathic reactions of others. The model used visual cues as the input, but I wonder if we could improve the explanatory power of the model but making it multi-modal, e.g. by adding auditory and textual cues of empathy, too?

beilrz commented 3 weeks ago

Human-level control through deep reinforcement learning

This research integrate the Q-learning approach with deep learning to achieve significantly better result than previous RL algorithms based on hand-crafted features. I believe a crucial area for further exploration is the generalization capability of the deep Q-network (DQN) to new tasks or domains that were not encountered during its training. Furthermore , it would be interesting to examine whether transfer learning techniques can be applied to leverage the knowledge gained from previous experiences, including these outside the RL models, in order to reduce the need for extensive retraining and improving efficiency in new environments.

mingxuan-he commented 3 weeks ago

In the Machine Theory of Mind paper, I believe it would be interesting to compare it to a multi-agent learning environment with transparent state i.e. all agents can see each other's inner state variables. It'll be even more interesting to observe agents that hold false beliefs impacting other agents' beliefs under this framework.

hantaoxiao commented 3 weeks ago

Human-level control through deep reinforcement learning (2015):

Learning ‘What-if’ Explanations for Sequential Decision-Making (2021):

Brian-W00 commented 2 weeks ago

The paper "Machine Theory of Mind" proposes a neural network - ToMnet - that models the mental state of other agents by observing behavior. How can this network maintain its modeling accuracy and adaptability when facing complex and changing real-world social interaction scenarios? Furthermore, what are the potential challenges and opportunities for such theoretical mental models in understanding and predicting irrational human behavior?

MarkValadez commented 2 weeks ago

When looking at the "Machine Theory of the Mind" the question that remained in my mind is that there seems to be some disconnect between optimizing over an agents decision model vs predicting the counter-factual of the cases. For example, I do not need to approximate a doctors risk aversion or utility function to predict the counter factual of a patient outcome. This because in my mind you only need the track record of the events themselves and no the track record of the choice. Where it does seem relevant the agent modeling is our primary focus, where we see that there is some performance Y and we want to understand the relevant decisions and values with respect to X that led to performance Y. But otherwise the use of the counterfactual alongside the agent model approximation seems somewhat disconnected. Am I missing something?

Pei0504 commented 2 weeks ago

Based on the article "Human-level control through deep reinforcement learning,"how does the integration of deep Q-networks (DQNs) and experience replay contribute to the stability and performance of reinforcement learning agents in complex environments? Specifically, consider the implementation of these techniques in the context of mastering a diverse array of Atari 2600 games. What are the key mechanisms by which experience replay mitigates instability, and how do DQNs enable agents to generalize learning from high-dimensional sensory inputs to develop effective policies?