apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.41k stars 3.4k forks source link

[MetaSchedule] Adding post optimization in MetaSchedule to Improve Scheduling #17104

Open canesche opened 2 weeks ago

canesche commented 2 weeks ago

Description

This pull request aims to enhance model optimization by adding post optimization in MetaSchedule. The proposed approach involves the following steps:

  1. Execution of MetaSchedule over an end-to-end model that requires optimization.
  2. Selection of the best implementation identified by MetaSchedule for the given model.
  3. Utilization of Droplet Search to exploit the selected candidate.

By using Droplet Search as a post optimization (Droplet paper), we have been able to reduce the number of trials explored by MetaSchedule while still achieving faster kernel performance. We have observed this improvement on the following architectures: Nvidia A100, Nvidia 3080, AMD x86, and ARM A64FX. The results can be found in this report: bennu paper

Proposed Changes

Motivation

This pull request introduces an exploitation phase leveraging the coordinate descent algorithm to MetaSchedule. By iteratively refining the best kernel identified by MetaSchedule, we achieve two key benefits:

  1. Reduced Sample Requirements: Coordinate descent search minimizes the number of samples MetaSchedule needs to discover high-performing schedules.
  2. Faster Kernels: The refined kernels exhibit improved execution speed compared to those found by MetaSchedule alone, even when it uses more samples.

Thus, this PR optimizes MetaSchedule along two crucial dimensions: search efficiency and kernel performance.

Testing and Validation

Extensive testing has been conducted to validate the efficacy and performance improvements achieved through the integration of MetaSchedule and Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD x86, and ARM A64FX architectures to assess the impact on kernel speed and search time reduction compared with 10,000 trials from MetaSchedule execution. These results are available in Section 3 of this manuscript: paper

Additional Notes

This pull request builds upon prior research and experimentation in model optimization. The proposed approach improves end-to-end models across diverse hardware platforms while still reducing MetaSchedule's search time. We welcome the community’s feedback, suggestions, and contributions to further refine and enhance these methodologies.

Thank you.

Sincerely,

Michael Canesche, Gaurav Verma, and Fernando Pereira