QiXuanWang / LearningFromTheBest

This project is to list the best books, courses, tutorial, methods on learning certain knowledge
8 stars 1 forks source link

Chip Design with Deep Reinforcement Learning By: Google (Azalia Mirhoseini etc. Jeff Dean) #49

Open QiXuanWang opened 3 years ago

QiXuanWang commented 3 years ago

Link: https://arxiv.org/abs/2004.10746 Link2: https://ai.googleblog.com/2020/04/chip-design-with-deep-reinforcement.html Link3: https://www.semanticscholar.org/paper/Chip-Placement-with-Deep-Reinforcement-Learning-Mirhoseini-Goldie/929bf1a2ff229d34f7907886989c621444c2b8fd?sort=relevance&page=2&citationRankingModelVersion=v0.2.0-0.25 (with citations)

Published on 2020.04, Google AI blog as well as arxiv. This is a follow up from Jeff Dean's presentation. This is officially first technical paper from Google. This project was a collaboration between Google Research and Google Hardware and Architecture teams.

Problem: They select floor-planning as the target problem.

Innovation: The input to our model is the chip netlist (node types and graph adjacency information), the ID of the current node to be placed, and some netlist metadata, such as the total number of wires, macros, and standard cell clusters. The netlist graph and the current node are passed through an edge-based graph neural network that we developed to encode the input state. This generates embeddings of the partially placed graph and the candidate node.

image A graph neural network generates embeddings that are concatenated with the metadata embeddings to form the input to the policy and value networks.

image

The edge, macro and netlist metadata embeddings are then concatenated to form a single state embedding, which is passed to a feedforward neural network. The output of the feedforward network is a learned representation that captures the useful features and serves as input to the policy and value networks. The policy network generates a probability distribution over all possible grid cells onto which the current node could be placed. In each iteration of training, the macros are sequentially placed by the RL agent, after which the standard cell clusters are placed by a force-directed method, which models the circuit as a system of springs to minimize wirelength. RL training is guided by a fast-but-approximate reward signal calculated for each of the agent’s chip placements using the weighted average of approximate wirelength (i.e., the half-perimeter wirelength, HPWL) and approximate congestion (the fraction of routing resources consumed by the placed netlist).

image During each training iteration, the macros are placed by the policy one at a time and the standard cell clusters are placed by a force-directed method. The reward is calculated from the weighted combination of approximate wirelength and congestion.

Overview of Our Approach states: the set of possible states of the world (e.g., inour case, every possible partial placement of the netlistonto the chip canvas). actions: the set of actions that can be taken by theagent (e.g., given the current macro to place, the avail-able actions are the set of all the locations in the dis-crete canvas space (grid cells) onto which that macrocan be placed without violating any hard constraintson density or blockages). state transition: given a state and an action, this is theprobability distribution over next states.

_Reward_ Our goal in this work is to minimize power, performance and area, subject to constraints on routing congestion and density. Our true reward is the output of a commercial EDA tool, including wirelength, routing congestion, density, power, timing, and area. However, RL policies require100,000s of examples to learn effectively, so it is critical that the reward function be fast to evaluate, ideally running in a few milliseconds. In order to be effective, these approximate reward functions must also be positively correlated with the true reward. Therefore, a component of our cost is wire length, because it is not only much cheaper to evaluate, but also correlates with power and performance(timing). We define approximate cost functions for bothwirelength and congestion, as described in Section 3.3.1and Section 3.3.5, respectively.

The results are on-par for area and power but superb for timing. And it takes 6 hours only, compared with weeks of manual work. Very promising.