This PR introduces a new TIR schedule primitive annotate_buffer_access that allows explicit annotation of buffer access regions for both reads and writes.
Motivation
TVM currently does not support inferring the numerical range of floating-point calculations. As a result, buffer access regions involving floating-point calculations cannot be accurately inferred and default to the full extent of the buffer. This new primitive addresses this limitation by allowing manual specification of access regions.
Usage scenarios
This primitive is particularly useful for operations where the default buffer region inference may not capture the precise access patterns, such as in resize operations. It overrides the automatically inferred region for the specified buffer.
The primitive adds an annotation(T.block_attr({"explicit_read_region": [0]})) to the block, indicating that an explicit region has been provided for the buffer at the given index. This annotation is used in the CompactBufferAllocation pass to respect the manually specified region instead of relying on automatic inference.
Resize Op Tile Example
We can optimize the tiling of the "cache" block for the "resize" operation using the annotate_buffer_access primitive.
before:
Let's split the i2 loop and i3 loop of the "resize" block, and then compute-at "cache" block to outer loop of resize. This is a typical schedule of tile process.
h, w = s.get_loops(resize_block)[-2:]
ho, hi = s.split(h, factors=[10, 10])
wo, wi = s.split(w, factors=[10, 10])
s.reorder(ho, wo, hi, wi)
s.compute_at(cache_block, wo)
Notice that the "cache" block still reads the entire 200x200 region after compute-at. To optimize this, we can use annotate_buffer_access to explicitly annotate the buffer region of the "resize" block:
The "cache" block now only reads the necessary 24x24 region instead of the entire 200x200 input. These optimizations significantly reduce memory bandwidth requirements and improve cache efficiency, especially for larger input sizes.
Note
Caution should be exercised when using this function, as incorrect annotations may lead to incorrect code generation or runtime errors. It's crucial to ensure that the specified region covers all actual reads or writes performed by the block for the given buffer.
cc @Hzfengsy @junrushao
Overview
This PR introduces a new TIR schedule primitive annotate_buffer_access that allows explicit annotation of buffer access regions for both reads and writes.
Motivation
TVM currently does not support inferring the numerical range of floating-point calculations. As a result, buffer access regions involving floating-point calculations cannot be accurately inferred and default to the full extent of the buffer. This new primitive addresses this limitation by allowing manual specification of access regions.
Usage scenarios
This primitive is particularly useful for operations where the default buffer region inference may not capture the precise access patterns, such as in resize operations. It overrides the automatically inferred region for the specified buffer.
Example
Trivial Example
before:
Perform annotate_buffer_access:
after:
The primitive adds an annotation(
T.block_attr({"explicit_read_region": [0]})
) to the block, indicating that an explicit region has been provided for the buffer at the given index. This annotation is used in the CompactBufferAllocation pass to respect the manually specified region instead of relying on automatic inference.Resize Op Tile Example
We can optimize the tiling of the "cache" block for the "resize" operation using the annotate_buffer_access primitive. before:
Let's split the i2 loop and i3 loop of the "resize" block, and then compute-at "cache" block to outer loop of resize. This is a typical schedule of tile process.
After tiling without annotate_buffer_access:
Notice that the "cache" block still reads the entire 200x200 region after compute-at. To optimize this, we can use annotate_buffer_access to explicitly annotate the buffer region of the "resize" block:
After tiling with annotate_buffer_access:
The "cache" block now only reads the necessary 24x24 region instead of the entire 200x200 input. These optimizations significantly reduce memory bandwidth requirements and improve cache efficiency, especially for larger input sizes.
Note
Caution should be exercised when using this function, as incorrect annotations may lead to incorrect code generation or runtime errors. It's crucial to ensure that the specified region covers all actual reads or writes performed by the block for the given buffer. cc @Hzfengsy @junrushao