ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

MLSys'21 | Value Learning for Throughput Optimization of Deep Learning Workloads #68

Closed ganler closed 2 years ago

ganler commented 2 years ago

Paper Summary

This paper leverages Reinforcement Learning (RL) to iteratively generate the seemingly optimal schedule for tensor programs. The computing model of domain-specific tensor computing languages like Halide and Tensor IR has separate modelling of the computing logic (program) and optimizations (scheduling). Given a program, tensor compilers aim at generating high-performance schedules targeting desired performance on a specific platform. Prior and recent autotuning work generates random schedules and uses learning-based techniques to predict the performance, therefore, the least-time-consuming schedule will be selected to generate target code. This paper, in addition to schedule mutation (e.g., change split sizes, reorder the loop structure, vectorized the loops, insert temporary buffers, and thread parallelism), uses RL to iteratively predict the sub-optimal schedule of the current stage given a group of scheduling candidates (i.e., beam search). The RL model will predict its runtime and the fastest schedule will be recorded. In this way, a pipeline with N stages, if we create M samples for each, the complexity is O(N x M) rather than O(M^N). In their feature engineering, there're 3 categories of features: 1) FLOPs, integer operations, and memory accessing patterns; 2) # vectorization, # unique cache lines accessed, # bytes read & written, etc; 3) derived feature based on original features (e.g., the ratio of vectorization instructions); The model used is a Bi-LSTM. From the experimental results, it outperforms TVM & Halide by 2.6× over Halide and 1.5× over TVM.

Strength

Weakness