This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:
Dynamic Gradient Descent Search:
Implements a new search strategy that uses gradient descent in a multi-dimensional tile-space.
Utilizes online measurements and proxy model to guide the search process.
Record Processor:
A new class to handle the processing and modification of measure records.
Includes methods to extract and modify SP node coordinates.
Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.
Usage:
To use the DGD Search algorithm, instantiate the DynamicGradientSearchTuner class with the desired parameters and call the dynamic_gradient_search method.
The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):
Bert: 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
ResNet-50: 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
MobileNetV2: 16000 trials, running on an Nvidia RTX 4090 for 7 hours.
Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor
Networks
Ratio (1 hour)
Ratio (full)
Bert
93.71%
100.15%
ResNet-50
90.46%
96.73%
MobileNetV2
95.08%
101.75%
This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.
This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:
Dynamic Gradient Descent Search:
Record Processor:
This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.
Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.
Usage:
To use the DGD Search algorithm, instantiate the
DynamicGradientSearchTuner
class with the desired parameters and call thedynamic_gradient_search
method.Example:
Experiments setup:
The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):
Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor
This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.