[x] The Deep Learning Compiler: A Comprehensive Survey: [Paper] [Note]
Graph Optimization
[x] Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks[OSDI'20]: [Paper] [Note]
[ ] Pet: Optimizing Tensor Programs with Partially Equivalent Transformations and
Automated Corrections[OSDI'21]: [Paper]
Kernel Fusion
[x] AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures[ASPLOS'22]: [Paper] [Note]
[x] BOLT: BRIDGING THE GAP BETWEEN AUTO-TUNERS AND HARDWARE-NATIVE PERFORMANCE[MLSys'22]: [Paper] [Note]
[x] ROLLER: Fast and Efficient Tensor Compilation for Deep Learning[OSDI'22]: [Paper] [Note]
[x] Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning[OSDI'23]: [Paper] [Note]
[x] Welder: Scheduling Deep Learning Memory Access via Tile-graph[OSDI'23]: [Paper] [Note]
[x] Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion[HPCA'23]: [Paper] [Note]
[x] Effectively Scheduling Computational Graphs of Deep Neural Networks
toward Their Domain-Specific Accelerators[OSDI'23]: [Paper]
[ ] Operator Fusion in XLA: Analysis and Evaluation: [Paper]