Closed YunchaoYang closed 1 year ago
Game Theory
Repeated Game Strategies
Finding best response against a repeated game finite-state strategy is the same as solving a MDP
Tit-for-tat
Grim Trigger
Pavlov
Folk Theorem: Any feasible payoff profile that strictly dominates the minmax/security level profile can be realized as a Nash equilibrium payoff profile, with sufficiently large discount factor. 无名氏定理: 如果参与者对未来足够有耐心, 如果参与者对未来足够有耐心, 任何程度的合作(只要是可行的且满足个人理性)都可以通过一个子博弈精炼纳什均衡来达成。
In repeated games, the possibility of retaliation opens the door for cooperation.
Feasible Region
MinMax Profile
(too vague, the worst case for one player): best in the worst possible conditions
why it is a profile? a strategy profile refers to a set of strategies, one for each player in the game, that specifies how each player will act in the game. Say mixed strategy
Subgame Perfect
Plausible Threats: is there a formal definition in English with good intuition?
Zero Sum Stochastic Games
Value Iteration works!
Minimax-Q converges
Unique solution to Q*
Policies can be computed independently
Update efficient
Q functions sufficient to specify policy
General Sum Stochastic Games
Value Iteration doesn't work
Minimax-Q doesn't converge
No unique solution to Q*
Policies cannot be computed independently
Update not efficient
Q functions not sufficient to specify policy
Reinforcement Learning: A Survey
Put an agent into a world (make sure you can describe it with an MDP!), give him some rewards and penalties and hopefully he will learn.
Markov Decision Processes
Model-Based vs. Model-Free
Three types of RL
Q Learning
what is entropy, how to calculate?
What is Joint Entropy: mutual information?
What is K-L divergence?
Information Theory
Feature Selection
Feature Transformation
2 Unsupervised Learning
UL consists of algorithms that are meant to "explore" on their own and provide the user with valuable information concerning their dataset/problem