Chapter 6 اساسيات علم البيانات والذكاء الاصطناعي

Summary of the chapter in the form of points

This chapter focuses on competitive environments and adversarial search problems, where multiple agents have conflicting goals.
The chapter emphasizes games like chess, Go, and poker as examples of competitive environments.
The simplified nature of these games is advantageous for AI researchers, as game states are easily represented and agents have a limited set of actions with precise rules.
Physical games such as croquet and ice hockey have more complex descriptions, a wider range of possible actions, and less precise rules governing the legality of actions.
Except for robot soccer, these physical games have not received much attention in the AI community.
There are three approaches to dealing with multi-agent environments in game theory.
The first approach is to treat the agents as part of the overall economy, allowing predictions about aggregate behavior without considering individual agent actions.
The second approach is to view adversarial agents as part of the environment, making the environment nondeterministic. However, this overlooks the fact that adversaries actively try to defeat us.
The third approach, covered in this chapter, is to explicitly model adversarial agents using adversarial game-tree search techniques.
The chapter starts with a discussion of a restricted class of games and introduces the concept of optimal moves and the minimax search algorithm, which is an extension of AND-OR search.
Pruning is introduced as a technique to make the search more efficient by disregarding parts of the search tree that do not affect the optimal move.
Due to time constraints, finding the optimal move in nontrivial games may not be feasible, even with pruning, and the search needs to be cut off at some point.
The evaluation of each stopping state involves determining who is winning, which can be done through a heuristic evaluation function based on state features or by averaging outcomes from simulations of the game.
The chapter also covers games involving chance (e.g., dice rolling, card shuffling) and games with imperfect information (e.g., poker, bridge).
The games commonly studied in AI, such as chess and Go, are deterministic, two-player, turn-taking, perfect information, zero-sum games.
"Perfect information" means that the game state is fully observable by all players, and "zero-sum" implies that what benefits one player harms the other.
The two players are referred to as MAX and MIN, with MAX moving first.
A game is defined by its initial state (S0), the player whose turn it is (TO-MOVE(s)), the set of legal moves (ACTIONS(s)), the transition model (RESULT(s, a)), the terminal test (IS-TERMINAL(s)), and the utility function (UTILITY(s, p)).
The utility function assigns a numeric value to each terminal state, indicating the final outcome for each player.
The state space graph is formed by the initial state, legal actions, and resulting states, while the search tree is a subset of the state space graph used for determining the best move.
The complete game tree encompasses every possible sequence of moves until reaching a terminal state, which may be infinite in some cases.
The game tree for tic-tac-toe is shown as an example, with MAX (X) and MIN (O) taking turns placing their symbols until a terminal state is reached.
The game tree for more complex games like chess is too large to be practically realized in the physical world.
In chess, the outcome is typically win, loss, or draw, represented by values of 1, 0, or 1/2, respectively.
The game tree and search tree concepts are similar to those discussed in Chapter 3.
Different terms like "imperfect information game" and "partially observable game" are used for games with hidden information or limited visibility of the environment.
Chess, although not strictly zero-sum, is considered as such due to the symmetric nature of outcomes.
The game tree and utility values help in determining the optimal moves and strategies for players.
In games where MAX aims to win, MAX's strategy must be a conditional plan that considers MIN's possible moves.
The optimal strategy in such games can be obtained using minimax search, which determines the minimax value of each state in the game tree.
The minimax value of a state (MINIMAX(s)) represents the utility (for MAX) of being in that state, assuming both players play optimally.
Terminal states have their utility values directly, while non-terminal states consider the maximum value for MAX and the minimum value for MIN.
The minimax value of a state is determined as follows:

If the state is terminal, its minimax value is its utility value.

If it is MAX's turn to move, the minimax value is the maximum of the minimax values of its successor states.

If it is MIN's turn to move, the minimax value is the minimum of the minimax values of its successor states.

The optimal move for MAX is the one that leads to the state with the highest minimax value.
The assumption in optimal play is that both players play optimally. If MIN plays suboptimally, MAX will still perform at least as well as against an optimal player.
However, in certain situations, it may be advantageous for MAX to deviate from the optimal move when facing a suboptimal opponent, especially if there is a higher-risk move that leads to multiple reasonable responses from MIN, some of which are advantageous for MAX.
The minimax search algorithm finds the best move for MAX by recursively exploring the game tree and determining the minimax value for each state.
The algorithm consists of three main functions: MINIMAX-SEARCH, MAX-VALUE, and MIN-VALUE.
MINIMAX-SEARCH takes the game and the current state as input and returns the optimal action for MAX.
MAX-VALUE and MIN-VALUE functions are recursive and calculate the backed-up value of a state and the move leading to it.
The algorithm starts at the root node and recursively explores the tree until it reaches terminal states.
At each MAX node, it chooses the action that maximizes the backed-up value.
At each MIN node, it chooses the action that minimizes the backed-up value.
The algorithm backtracks and propagates the backed-up values up the tree as the recursion unwinds.
The time complexity of the minimax algorithm is O(b^m), where b is the branching factor and m is the maximum depth of the tree.
The space complexity depends on how actions are generated, either O(b^m) if all actions are generated at once or O(m) if actions are generated one at a time.
The exponential complexity of minimax makes it impractical for complex games with large branching factors and depths.
Approximations and optimizations can be applied to derive more practical algorithms based on minimax analysis.
The minimax algorithm serves as a foundation for the mathematical analysis of games and provides insights into optimal play.
Extending the minimax idea to multiplayer games involves using a vector of values for each node, where each value represents the utility from the viewpoint of a different player.
The UTILITY function returns a vector of utilities for terminal states, providing the utilities from each player's perspective.
Nonterminal states require considering the choices of multiple players. The backed-up value of a node is the utility vector of the successor state with the highest value for the player choosing at that node.
Alliances are common in multiplayer games, and they can emerge as a result of optimal strategies for each player.
Weaker players may collaborate against a stronger player to improve their chances of success.
Collaboration can also occur in non-zero-sum games, where players work together to achieve a mutually desirable outcome.
Breaking an alliance may come with social consequences or the risk of being perceived as untrustworthy.
The complexities of alliances and collaboration in multiplayer games are further explored in Section 17.2.
Alpha-beta pruning is a technique used to reduce the number of states evaluated in the minimax algorithm.
By pruning parts of the game tree that do not affect the final outcome, we can significantly decrease the computational cost.
Alpha-beta pruning allows us to identify the optimal decision without evaluating certain leaf nodes.
The technique involves maintaining two parameters: alpha (α) and beta (β), which represent the best choices found so far for MAX and MIN players, respectively.
During the search, if a player has a better choice available elsewhere in the tree (at the same level or higher), the current node can be pruned.
Alpha represents the minimum value that MAX is assured of, while beta represents the maximum value that MIN is assured of.
The pruning can occur at any level of the tree, not just at the leaves.
Alpha-beta pruning is a depth-first search algorithm, examining nodes along a single path in the tree at any given time.
The values of alpha and beta are updated as the search progresses, allowing for early termination of branches that are known to be worse than the current values.
The complete algorithm for alpha-beta pruning is provided in Figure 6.7, and Figure 6.5 illustrates the progress of the algorithm on a game tree.
The effectiveness of alpha-beta pruning in reducing the number of evaluated states depends heavily on the order in which states are examined.
The order of state examination can significantly impact the pruning ability of alpha-beta pruning.
Examining the worst successors first may prevent pruning of other successors, while examining the best successors first can lead to more effective pruning.
Ideally, if the best move ordering could be determined perfectly, alpha-beta pruning would only need to examine O(b^(m/2)) nodes instead of O(b^m) in minimax, significantly reducing the computational cost.
Achieving perfect move ordering is not feasible, but using heuristics and dynamic move-ordering schemes can come close to improving the effectiveness of alpha-beta pruning.
Dynamic move-ordering schemes, such as trying previously found best moves or using iterative deepening, can improve move ordering and lead to better pruning.
Killer moves, which are the best moves discovered in previous searches, can be tried first in move ordering to enhance pruning.
Redundant paths to repeated states in game tree search can be addressed using a transposition table that caches the heuristic values of states, enabling efficient lookup and avoiding repeated searches.
Transposition tables are particularly effective in games like chess, allowing for a significant increase in search depth within the same amount of time.
Despite the benefits of alpha-beta pruning and move ordering, minimax alone is not sufficient for complex games like chess and Go due to the large number of states. Strategies such as Type A (wide and shallow exploration) and Type B (deep and narrow exploration) have been proposed to address this challenge.
Type A ststrategirategies consider all possible moves to a certain depth and use a heuristic evaluation function to estimate state utilities, while Type B strategies focus on promising lines and explore deeply but narrowly.
Chess programs have historically used Type A strategies, while Go programs have leaned towards Type B strategies. However, Type B programs have also shown strong performance in chess and other games.
To utilize limited computation time in alpha-beta pruning, we can introduce a cutoff test that determines when to stop the search and apply a heuristic evaluation function to nonterminal nodes.
The heuristic evaluation function, EVAL, estimates the utility of a state instead of using the actual UTILITY function.
The cutoff test replaces the terminal test and determines when to stop the search based on the search depth and any state properties considered relevant.
The heuristic minimax value, denoted as H-MINIMAX(s, d), represents the estimated utility of state s at search depth d.
If the cutoff test determines that a state s at depth d should be cut off, the heuristic evaluation EVAL is applied, returning the estimated utility for MAX.
If the cutoff test does not apply and it is MAX's turn to move, the maximum heuristic minimax value among the successor states is recursively calculated at depth d+1.
If the cutoff test does not apply and it is MIN's turn to move, the minimum heuristic minimax value among the successor states is recursively calculated at depth d+1.
A heuristic evaluation function EVAL(s, p) estimates the expected utility of state s for player p and should be computationally efficient.
The evaluation function should be strongly correlated with the actual chances of winning.
The evaluation function must satisfy the following conditions:

For terminal states, EVAL(s, p) should equal UTILITY(s, p).

For nonterminal states, the evaluation should fall between the utility values of a loss and a win: UTILITY(loss, p) ≤ EVAL(s, p) ≤ UTILITY(win, p).

Evaluation functions typically calculate features of the state, such as the number of pawns, queens, etc., to categorize states based on their feature values.

The evaluation function estimates the proportion of states in each category that lead to a win, loss, or draw, providing a single value representing the expected value for that category.
In practice, evaluation functions compute separate numerical contributions from each feature and combine them using a weighted linear function.
A weighted linear function EVAL(s) can be expressed as EVAL(s) = w1f1(s) + w2f2(s) + ... + wnfn(s), where fi(s) represents a feature and wi represents its weight.
The weights should be normalized to ensure the sum falls within the range of a loss (0) to a win (+1).
While the evaluation function should be strongly correlated with the chances of winning, it does not need to have a linear correlation.
Nonlinear combinations of features are often used in evaluation functions to account for feature interdependencies and contextual factors.
The features and weights in evaluation functions are derived from human chess-playing experience, and in the absence of such experience, machine learning techniques can be applied to estimate the weights.
Machine learning has shown promising results in replicating the knowledge gained from centuries of human experience in a relatively short time.
In order to apply the heuristic evaluation function and cut off the search when appropriate, modifications are made to the ALPHA-BETA-SEARCH algorithm.
The lines in the algorithm that involve IS-TERMINAL are replaced with the following line: "if game.IS-CUTOFF(state, depth) then return game.EVAL(state, player), null".
Bookkeeping is required to increment the current depth on each recursive call.
One approach to controlling the search depth is to set a fixed depth limit, where IS-CUTOFF(state, depth) returns true for depths greater than a fixed depth value (as well as for terminal states).
Another approach is iterative deepening, where the search is performed with increasing depth limits until time runs out. The move selected by the deepest completed search is returned when time expires.
Iterative deepening improves move ordering and allows the use of transposition table entries from previous rounds, making subsequent rounds faster.
Errors can occur due to the approximate of the evaluation function. For example, a simple evaluation function based on material advantage in chess may misjudge the outcome of a position.
Quiescence search is employed to ensure that the evaluation function is applied only to quiescent positions, where there are no pending moves that would significantly impact the evaluation.
Non-quiescent positions continue the search until quiescent positions are reached. Quiescence search may focus on specific types of moves, such as capture moves, to quickly resolve uncertainties in the position.
The horizon effect is a challenge where delaying tactics can temporarily hide an opponent's strong move beyond the search horizon.
Singular extensions are a strategy to mitigate the horizon effect. Moves that are "clearly better" than other moves in a given position are given an opportunity to extend the search, even if the search would normally be cut off.
Singular extensions deepen the tree, but they are typically few in number, resulting in an effective mitigation of the horizon effect.
Forward pruning is a strategy that prunes moves that appear to be poor but may actually be good moves, saving computation time at the risk of making errors.
Beam search is an approach to forward pruning where only a "beam" of the n best moves, determined by the evaluation function, is considered at each ply instead of all possible moves. However, this approach can be risky as the best move might be pruned.
The PROBCUT algorithm is a forward-pruning version of alpha-beta search that uses statistics from prior experience to reduce the chances of pruning the best move. It estimates the probability that a node's score at a certain depth would be outside the (α,β) window based on a shallow search and past experience.
Late move reduction is another technique that assumes well-ordered moves, where moves appearing later in the list are less likely to be good moves. Instead of pruning them completely, the depth of the search for these moves is reduced, saving time. If the reduced search returns a value above the current α value, a re-run with the full depth is performed.
Combining these techniques results in a program capable of playing competitive chess. A program with minimax search and a million nodes per second could look ahead only five ply in about a minute, which is not enough to compete effectively. With alpha-beta search and a large transposition table, around 14 ply can be achieved, reaching an expert level of play.
To achieve grandmaster status, extensive tuning of the evaluation function and a large database of endgame moves are necessary, in addition to high computational power. Top chess programs like STOCKFISH can reach depth 30 or more in the search tree, surpassing human players' abilities.
Chess programs often rely on table lookup rather than search for the opening and endgame phases of the game.
In the opening, the computer can use tables that contain expert advice and statistics gathered from previously played games. This allows the program to make moves based on well-established opening sequences.
As the game progresses and more moves are made, the positions encountered become less common, and table lookup becomes less effective. At this point, the program needs to switch to search-based methods.
During the endgame, where there are fewer possible positions, table lookup becomes easier. Computers have advanced analysis capabilities for endgames, surpassing human abilities.
Computer analysis of endgames involves retrograde minimax search, where the computer generates a table or policy mapping every possible state to the best move in that state. This is achieved by considering all possible placements of the endgame pieces and determining wins, losses, or draws for each position.
Retrograde minimax search is used to construct extensive lookup tables for endgames with seven or fewer pieces. These tables can contain trillions of positions and provide an infallible guide for optimal play in those endgame scenarios.
The game of Go presents challenges for traditional alpha-beta tree search due to its high branching factor and the difficulty of defining a good evaluation function.
Monte Carlo tree search (MCTS) is a strategy used in modern Go programs as an alternative to alpha-beta search.
MCTS estimates the value of a state by averaging the utility over a number of simulations (playouts) of complete games starting from that state.
Playouts are performed by choosing moves according to a playout policy, which can be learned from self-play or using game-specific heuristics.
Pure Monte Carlo search involves conducting a fixed number of simulations starting from the current state and selecting the move with the highest win percentage.
MCTS uses a selection policy, such as UCT (Upper Confidence Bounds applied to Trees), to guide the search towards promising parts of the game tree by balancing exploration and exploitation.
The UCT formula calculates a score for each node based on the average utility, the number of playouts, and exploration term that favors less explored nodes.
The MCTS algorithm consists of four steps: selection, expansion, simulation, and back-propagation.
The move with the highest number of playouts is typically chosen as the best move, rather than the move with the highest average utility, to account for uncertainty.
MCTS is particularly useful for games with high branching factors or where defining a good evaluation function is challenging.
MCTS can be combined with evaluation functions or alpha-beta search techniques for improved performance.
Monte Carlo search can be applied to new games without pre-defined evaluation functions by relying solely on the game rules.
MCTS has limitations, such as potential overlook of critical moves and longer playouts required to verify obvious wins.
Monte Carlo search shares similarities with reinforcement learning, as it involves simulating moves, observing outcomes, and using them to determine good moves.
Stochastic games incorporate a random element, such as dice rolls, to simulate the unpredictability of real-life situations.
Backgammon is an example of a stochastic game that combines luck and skill.
In backgammon, players move their pieces clockwise (Black) or counterclockwise (White) on a board towards their respective goals.
Each player has a set of legal moves based on the dice roll, but the outcome of the opponent's roll is unknown.
Constructing a game tree for backgammon requires including chance nodes in addition to MAX and MIN nodes.
Chance nodes represent the possible dice rolls, and each branch is labeled with the roll and its probability.
The expectiminimax value is used to make decisions in games with chance nodes. It calculates the expected value of a position by averaging over all possible outcomes of chance nodes.
Terminal nodes and MAX and MIN nodes function similarly to deterministic games, with the caveat that the legal moves for MAX and MIN depend on the outcome of the previous chance node.
The expectiminimax value is computed by recursively evaluating the utility of terminal nodes, selecting the maximum value for MAX nodes, the minimum value for MIN nodes, and calculating the weighted sum of expected values for chance nodes.
Stochastic games introduce randomness into gameplay, as seen in backgammon.
Backgammon involves moving pieces based on dice rolls and has chance nodes in its game tree.
The expectiminimax value is used to determine the best move in backgammon, considering the average outcome of chance events.
Applying an evaluation function to each leaf in expectiminimax allows for an approximation of the best move.
Evaluation functions in games of chance, such as backgammon, need to be more careful due to the presence of chance nodes.
The values assigned by the evaluation function should represent a positive linear transformation of the probability of winning or expected utility.
Small changes in evaluation values can lead to significant changes in the program's behavior and preferred moves.
Solving a game with dice would be similar to solving a game without dice if the program knew all future dice rolls, but expectiminimax considers all possible dice-roll sequences.
The time complexity of expectiminimax is O(b^mn), where b is the branching factor, m is the maximum depth of the game tree, and n is the number of distinct rolls.
The high computational cost of expectiminimax makes it unrealistic to search very far ahead in most games of chance.
The unpredictability of dice rolls makes it challenging to determine likely sequences of moves, rendering detailed plans of action ineffective.
Alpha-beta pruning can be applied to game trees with chance nodes, including chance node pruning by finding upper bounds on their values without evaluating all children.
In games with a high branching factor for chance nodes, forward pruning or Monte Carlo tree search may be considered as alternatives to using an evaluation function.
Evaluation functions in games of chance must consider the relationship between values and probabilities.
Expectiminimax considers all possible dice-roll sequences, making it computationally expensive.
Planning far ahead in games with chance becomes challenging due to the unpredictability of future events.
Alpha-beta pruning can be adapted for game trees with chance nodes, and chance node pruning is possible by setting upper bounds on their values.
Alternative approaches like forward pruning or Monte Carlo tree search can be considered in games with high branching factors for chance nodes.
Partially observable games differ from previous games by incorporating the element of partial observability, similar to the "fog of war" in real wars.
In real wars, the location of enemy units is often unknown until discovered through direct contact, leading to the use of scouts, spies, concealment, and bluff.
Partially observable games, like StarCraft, present additional challenges as they are partially observable, involve multiple agents, have nondeterministic elements, are dynamic, and involve unknown factors.
In deterministic partially observable games, uncertainty stems from a lack of access to the opponent's choices, rather than random events.
Examples of deterministic partially observable games include Battleship, where each player's ships are hidden, and Stratego, where piece types are hidden.
The game of Kriegspiel is a partially observable variant of chess where the opponent's pieces are completely invisible.
Other games, such as Phantom Go, Phantom tic-tac-toe, and Screen Shogi, also have partially observable versions.
Kriegspiel is a partially observable variant of chess where each player can only see their own pieces, while a referee can see the entire board.
In Kriegspiel, the player proposes a move that would be legal if there were no opponent's pieces. If the move is prevented by the opponent's pieces, the referee announces "illegal," and the player continues proposing moves until a legal one is found.
The referee periodically makes announcements, such as "Capture on square X" or "Check by D," where D represents the direction of the check.
The belief state in Kriegspiel represents all logically possible board states given the complete history of percepts so far.
State estimation in Kriegspiel involves updating the belief state based on the predictable outcome of the player's own move and the unpredictable outcome of the opponent's reply.
In Kriegspiel, a winning strategy is one that leads to checkmate for every possible board state in the belief state, regardless of the opponent's moves.
The AND-OR search algorithm can be applied to the belief-state space to find guaranteed checkmates, and the incremental belief-state algorithm is effective for midgame checkmates.
Kriegspiel also allows for probabilistic checkmates, where the winning player's moves are randomized and the checkmate works with a high probability in every board state in the belief state.
Sometimes accidental checkmates can occur if the opponent's pieces happen to be in the right positions.
Determining the likelihood of winning strategies and the true board state probabilities requires considering optimal randomized strategies and equilibrium solutions, which are currently challenging to compute for Kriegspiel.
Current approaches in Kriegspiel involve bounded-depth look-ahead in the player's own belief-state space, with evaluation functions considering the size of the belief state.
Further exploration of partially observable games will be covered under the topic of Game Theory in Section 17.2.
Card games like bridge, whist, hearts, and poker involve stochastic partial observability due to the random dealing of cards.
A common approach for solving these games is to treat the start of the game as a chance node with every possible deal as an outcome and use the EXPECTIMINIMAX formula to select the best move.
This approach, known as averaging over clairvoyance, assumes that once the actual deal has occurred, the game becomes fully observable to both players. However, it fails to consider the belief state the agent will be in after acting and ignores actions that gather information, hide information, or bluff.
Averaging over clairvoyance can be effective with some tricks, such as abstraction (treating similar hands as identical) and forward pruning (considering a small random sample of deals).
In games like bridge, where bidding occurs before play, players' bids provide some information about the probability of each deal, complicating decision-making.
Computers have achieved superhuman performance in poker using techniques like abstraction and monitoring opponents' strategies to plug holes in abstractions.
The computational costs involved in achieving world champion game play suggest that researchers with limited budgets may face limitations. Access to supercomputers or specialized hardware can be advantageous, but training can also be done via crowdsourcing.
There are limitations to game search algorithms, and new approaches are needed to address challenges like imperfect information, large state spaces, and limited computational resources.
Game search algorithms, such as alpha-beta search and Monte Carlo search, make assumptions and approximations due to the intractability of calculating optimal decisions in complex games.
Alpha-beta search is vulnerable to errors in the heuristic function, which can lead to incorrect moves. The dependencies between the values of sibling nodes make it difficult to compensate for errors.
Both alpha-beta and Monte Carlo search are designed to calculate values of legal moves, but they may waste computation time when there is an obvious best move. A better search algorithm would consider the utility of node expansions and prioritize high-utility expansions likely to lead to significantly better moves.
Metareasoning, the reasoning about reasoning, can help optimize the allocation of computational resources in game search algorithms.
Humans play games differently by reasoning at a more abstract level and considering higher-level goals. Incorporating higher-level planning and abstract representations into game search algorithms can improve their performance.
The ability to incorporate machine learning into game search processes is a significant issue. Early game programs relied on human expertise, but recent advancements like ALPHAZERO have utilized machine learning from self-play. Machine learning is covered in more detail in Chapter 19.
Games can be defined by their initial state, legal actions, action results, terminal test, and utility function.
Minimax algorithm is used in two-player, deterministic, turn-taking, zero-sum games with perfect information to find optimal moves by exploring the game tree.
Alpha-beta search improves the efficiency of minimax by pruning irrelevant subtrees.
Heuristic evaluation functions estimate the utility of a state when the entire game tree cannot be explored.
Monte Carlo tree search (MCTS) evaluates states by simulating playthroughs of the game and averaging the results.
Precomputed tables of best moves are used in some games to improve search efficiency.
Games of chance are handled using expectiminimax, which averages the utility of chance nodes based on the probabilities of their outcomes.
Games of imperfect information require reasoning about belief states, and approximations can be made by averaging values over possible configurations of missing information.
Game-playing programs have defeated human champions in various games, but humans still excel in games of imperfect information.
Programs have achieved competitive performance in video games, partly due to their ability to perform actions quickly.
Charles Babbage (1846): Discussed the feasibility of computer chess and checkers but did not understand the exponential complexity of search trees.
Leonardo Torres y Quevedo (1890): Built the first game-playing machine specializing in the "KRK" chess endgame.
Ernst Zermelo (1912): Developed the minimax algorithm, which forms the basis of modern game-tree search.
Claude Shannon (1950): Published "Programming a Computer for Playing Chess," which laid out major ideas for game-playing AI, including board representation, evaluation function, and selective game-tree search.
John McCarthy (1956): Conceived the idea of alpha-beta search.
I. J. Good (1965a): Stated that programming a computer to play a reasonable game of Go would be even more difficult than chess.
Arthur Samuel (1959, 1967): Developed a checkers program that learned its own evaluation function through self-play using reinforcement learning.
Jonathan Schaeffer (1992): His CHINOOK checkers program challenged and eventually defeated Marion Tinsley, the world champion.
ALPHAGO (2016): Defeated top-ranked Go professionals, Lee Sedol and Ke Jie, using a combination of Monte Carlo tree search, deep neural networks, and reinforcement learning.
ALPHAZERO (2017): Surpassed ALPHAGO and defeated top programs in chess, shogi, and Go through self-play and without any expert human knowledge.
MUZERO (2018): Operates without being told the rules of the game and figures out the rules through self-play. Achieved state-of-the-art results in multiple games.
Other mentions: Game milestones, such as Belle, Deep Thought, and Deep Blue in chess; advances in Othello, backgammon, and poker AI.

summary of the informationin the form of a table ;)

Topic	Summary
Competitive Environments	This chapter focuses on competitive environments and adversarial search problems in multi-agent scenarios. Games like chess, Go, and poker are used as examples of competitive environments. Physical games have more complex descriptions and rules compared to board games, but they have received less attention in the AI community.
Approaches to Multi-Agent Environments in Game Theory	Three approaches to dealing with multi-agent environments in game theory are discussed.
Minimax Search Algorithm	The chapter covers the minimax search algorithm, which is used to determine optimal moves in adversarial game-tree search. Pruning is introduced as a technique to make the search more efficient by disregarding parts of the search tree that do not affect the optimal move. Evaluation of stopping states involves determining who is winning using a heuristic evaluation function or averaging outcomes from game simulations.
Games with Chance and Imperfect Information	Games involving chance and imperfect information are discussed. The chapter explains the concepts of game states, legal moves, transition models, terminal tests, and utility functions. The game tree and search tree concepts are introduced, along with their relevance to determining optimal moves and strategies.
Alpha-Beta Pruning	The minimax search algorithm is explained in detail, including the functions MINIMAX-SEARCH, MAX-VALUE, and MIN-VALUE. The time and space complexity of the minimax algorithm are discussed. Alpha-beta pruning is introduced as a technique to reduce the number of evaluated states in the minimax algorithm. The effectiveness of alpha-beta pruning depends on the order in which states are examined. Move ordering schemes and transposition tables are discussed as ways to improve alpha-beta pruning.
Type A and Type B Strategies	Type A and Type B strategies are proposed to address the complexity of games with a large number of states. Cutoff tests and heuristic evaluation functions are introduced to handle limited computation time in alpha-beta pruning. The evaluation function should estimate the expected utility of a state and be strongly correlated with the chances of winning.
Evaluation Functions	Evaluation functions are used in game-playing programs to estimate the likelihood of winning, losing, or drawing in a given game state. They combine weighted features derived from human experience or machine learning techniques. The weights in the evaluation function should be normalized to fall within the range of 0 to 1, representing a loss to a win, respectively. Nonlinear combinations of features are often used to capture feature interdependencies and contextual factors.
Enhancements to Alpha-Beta Search	The ALPHA-BETA-SEARCH algorithm is modified to incorporate the evaluation function and cutoffs based on a fixed depth limit or iterative deepening. Quiescence search ensures the evaluation function is applied only to quiescent positions without pending moves that significantly impact the evaluation. The horizon effect refers to delaying tactics that temporarily hide strong opponent moves beyond the search horizon. Singular extensions and forward pruning are techniques to mitigate this effect. Beam search is a form of forward pruning where only the n best moves are considered at each ply, but it carries the risk of pruning the best move. PROBCUT is a forward-pruning algorithm that estimates the probability of pruning the best move based on shallow searches and past experience. Late move reduction reduces the search depth for moves appearing later in the move ordering, saving computation time. A combination of these techniques can result in competitive chess-playing programs, with
depths reaching expert or even grandmaster level.
Table Lookup and Search-Based Methods for Chess	Table lookup is often used for opening and endgame phases of chess, while search-based methods are used for the midgame. Computer analysis of endgames involves retrograde minimax search to generate extensive lookup tables for optimal play in endgame scenarios.
Monte Carlo Tree Search (MCTS)	Monte Carlo tree search (MCTS) is an alternative to alpha-beta search, commonly used in Go programs, estimating state values through simulations and balancing exploration and exploitation using selection policies like UCT. The UCT formula calculates a score for each node in the Monte Carlo Tree Search (MCTS) algorithm, taking into account the average utility, number of playouts, and an exploration term. The MCTS algorithm consists of four steps: selection, expansion, simulation, and back-propagation. MCTS is particularly useful for games with high branching factors or where defining a good evaluation function is challenging. MCTS can be combined with evaluation functions or alpha-beta search techniques for improved performance. MCTS can be applied to new games without pre-defined evaluation functions by relying solely on the game rules. MCTS has limitations, such as potentially overlooking critical moves and requiring longer playouts to verify obvious wins. Monte Carlo search shares similarities with reinforcement learning, involving simulating moves, observing outcomes, and using them to determine good moves.
Stochastic Games and Backgammon	Stochastic games incorporate a random element, such as dice rolls, to simulate unpredictability. Backgammon is an example of a stochastic game that combines luck and skill, where players move their pieces based on dice rolls. Constructing a game tree for backgammon requires including chance nodes for representing dice rolls and their probabilities. The expectiminimax value is used to make decisions in games with chance nodes, calculating the expected value by averaging over all possible outcomes. Evaluation functions in games of chance, like backgammon, need to consider the relationship between values and probabilities. The computational cost of expectiminimax makes it challenging to search far ahead in most games of chance.
Partially Observable Games and Kriegspiel	Partially observable games introduce additional challenges due to the element of partial observability, similar to the "fog of war" in real wars. Deterministic partially observable games involve uncertainty about the opponent's choices rather than random events. Kriegspiel is a partially observable variant of chess where each player can only see their own pieces, and a referee announces the outcome of moves. The belief state in Kriegspiel represents all logically possible board states given the history of percepts so far. Solving Kriegspiel involves updating the belief state based on the predictable outcome of the player's own moves and the unpredictable outcome of the opponent's replies. The AND-OR search algorithm and the incremental belief-state algorithm are used to find winning strategies in Kriegspiel.
Card Games with Random Dealing	Card games with random dealing, like bridge and poker, can be solved by treating the start of the game as a chance node and using expectiminimax. Averaging over clairvoyance, which assumes full observability after the actual deal, can be effective in some games, but it ignores the belief state and information-gathering actions.
Limitations and Optimization in Game Search Algorithms	Game search algorithms have limitations and approximations due to the complexity of calculating optimal decisions. Metareasoning can optimize the allocation of computational resources in game search algorithms. Higher-level planning and abstract representations can improve the performance of game search algorithms. Humans still excel in games of imperfect information.
Historical Milestones
Historical milestones include early discussions by Babbage, development of the minimax algorithm by Zermelo, and breakthroughs by Shannon, McCarthy, Samuel, and more recent achievements by AlphaGo, AlphaZero, and MuZero.

Arabic Term	English Translation
محيطات تنافسية	Competitive Environments
مشاكل البحث المعادية	Adversarial Search Problems
سيناريوهات متعددة الوكلاء	Multi-Agent Scenarios
خوارزمية البحث Minimax	Minimax Search Algorithm
التقليم	Pruning
وظيفة التقييم	Evaluation Function
حالات الإيقاف	Stopping States
الفرصة والمعلومات غير الكاملة	Chance and Imperfect Information
حالات اللعبة	Game States
الحركات القانونية	Legal Moves
نماذج التحول	Transition Models
اختبارات النهاية	Terminal Tests
وظائف الفائدة	Utility Functions
شجرة اللعبة	Game Tree
شجرة البحث	Search Tree
تعقيد الوقت والمساحة	Time and Space Complexity
التقليم ألفا-بيتا	Alpha-Beta Pruning
مخططات ترتيب الحركات	Move Ordering Schemes
جداول التبديل	Transposition Tables
اختبارات الانقطاع	Cutoff Tests
وظائف التقييم الاستدلالي	Heuristic Evaluation Functions
تأثير الأفق	Horizon Effect
البحث بالشعاع	Beam Search
البحث الهادئ	Quiescence Search
تمديدات غير عادية	Singular Extensions
التقليم التقدمي	Forward Pruning
بحث شجرة مونتي كارلو	Monte Carlo Tree Search (MCTS)
UCT (الحد الأعلى للثقة في الشجرة)	UCT (Upper Confidence Bound for Trees)
الألعاب العشوائية	Stochastic Games
الطاولة	Backgammon
قيمة توقعات الحد الأعلى	Expectiminimax Value
الألعاب القابلة للمشاهدة جزئيًا المحددة	Deterministic Partially Observable Games
كريجسبيل	Kriegspiel
خوارزمية البحث AND-OR	AND-OR Search Algorithm
خوارزمية الحالة المعتقدة التدريجية	Incremental Belief-State Algorithm
الوعي النبهاني	Clairvoyance
المنطق العليا	Metareasoning

Sarah111-AHM / Semsmah

Chapter 6 اساسيات علم البيانات والذكاء الاصطناعي #51

Summary of the chapter in the form of points

funny summary of the summary 😄

summary of the informationin the form of a table ;)

important terms and their translations in English ;)