Closed tqchen closed 1 month ago
cc @vinx13 @junrushao1994 @masahi @Lunderberg
Can we remove Load / Store
node altogether now?
Let us first finish the items, streamline changes and likely remove load/store in next release. So we have sufficent timewindow and error message for people to react
Discuss and clarify the scope of vector buffer indexing rule
I agree with adding additional restrictions. I've created draft PRs https://github.com/apache/tvm/pull/10512 and https://github.com/apache/tvm/pull/10513, to test the impact of different rules on existing test cases. Each requires that only a single index can have multiple lanes, while #10513 additional requires that to be the last index.
@tqchen After thinking on your comment here, I agree that if there are no issues introduced by using the stricter restriction, we should use it, as we can also loosen the condition later if needed.
Thanks for the great efforts!
Here is my question: Whether we will have some SSA (or no-alias?) property on Buffer
object in low-level tir? Previously we can treat every buffer var uniquely defined by Allocate
and used by Load
/Store
/Opaque accesses. Now there seems to be some unclearness. For example, what if we create multiple buffers on the same buffer var associated by Allocate
stmt?
@wrongtest great point. I agree it is important to summarize the aliasing property. @vinx13 perhaps can add more comments on this
While tir.noalias
property still hold for PrimFunc
args, T.decl_buffer
creates a buffer alias if the underlying data variable (.data
field) overlaps with another buffer. Buffer created via alloc_buffer
always do not alias. Buffer aliases do not need Allocate
to create the data variable. Each data variable can be allocated via Allocate
once. If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop that contains an allocation), the transform should update the allocations to be unique using tvm::tir::ConvertSSA
Buffers should not alias each other unless necessary, because aliased buffers increase complexity for TIR transformations. Passes that rewrite buffers should clearly indicate how aliased buffers are handled. For example, when changing the underlying layout of stored elements in a buffer, all buffer aliases must also be updated. While buffer aliasing is typically free at runtime, this imposes a cost for buffer aliasing both to compile times and development complexity.
We should also extend VeryfySSA
pass to also take buffer into consideration.
Both PRs for testing indexing restrictions, https://github.com/apache/tvm/pull/10512 and https://github.com/apache/tvm/pull/10513, have passed CI. I think I agree with the tighter restriction, that only the last index may have more than one lane. While there are cases where a ramp index in an earlier lane could be transformed into a ramp index in the last lane, those would be relatively rare, as it would require a ramp index to be present in user-defined calls.
I have closed #10512, and converted #10513 from a draft PR to an open PR.
Thanks @Lunderberg , if @vinx13 you also agree, can you review and merge https://github.com/apache/tvm/pull/10513?
Since Buffer
object takes an elem_offset
field (be zero by default), perhaps we should also process it in CodeGen. It could be useful if we want to create aliased buffers to sub-regions.
The proposed changes are here: https://github.com/apache/tvm/pull/10582
Also related to elem_offset
, I think it will need to allow multiple independent offsets. When a tensorized buffer is lowered, the elem_offset
is used to store the flattened index. For non-flat memory buffers, this requires multiple indices, but only one can be stored there. I think both elem_offset
and offset_factor
will require one parameter per physical dimension, and will need to be changed to arrays.
I have the proposed changes in #10816 , which should maintain backwards compatibility on the python API, and for constructor calls to Buffer
on the C++ API.
Thanks @Lunderberg , it would be great to deliberate on the choice of elem_offset
, to see if we have other alternatives, as that would also mean a major change.
Following some debugging with @areusch on an issue caused by preflattened_buffer_map
, I did some experimenting on how hard it would be to remove it altogether. #10940 is a proof of concept for keeping the preflattened shape in the buffer_map
, while the function body refers only to flattened aliases of the user-provided buffers. It looks like it is working pretty well, and intend to open an RFC for further discussion on it in the near future.
@Lunderberg this also relates to https://github.com/apache/tvm-rfcs/pull/63/ where we can use buffer declaration to create alias, I think this should be the right way do go. Additionally, it would be great to make buffer declaration(alias creation) explicit
Thank you, and I feel silly for forgetting RFC#63 after the draft discussions, and definitely agreed on the explicit aliasing being a better option for the long term.
As part of the follow-up (and as part of RFC#70), PR #10940 has now landed, removing the pre-flattened buffer map. The PrimFuncNode::buffer_map
now contains the original pre-flattened buffer shape through all TIR lowering steps, and flattened buffers are represented as aliases into the argument buffer.
(And updated the tracking issue's checklist.)
This is a tracking issue created for followup tracking items in https://github.com/apache/tvm/pull/9727.
The particular PR contains an important new feature(transform_layout). The main goal of this issue is to do followup shepherding around the IR change part(buffer access and related allocations), which have a broader impact. The particular PR takes many rounds of reviews and but still have quite a few points that would benefit from systematic clarification, cleanup and further polish.
Given that there is an important followup(of layout transform) that depend on the change, and the PR is already at a state where incremental changes can further block the progress as other concurrent changes are made. We took the deliberation to merge PR, then continue to do post-hoc shepherding. This decision is supposed to happen in rare circumstance with clear followup actions to backup the decisions.
This is the tracking issue for followup shepherd actions
High-level gist of change
Load/Store are deprecated and changes to BufferLoad/Store of 1D array. BufferLoad/Store of 1D elements are being used in the place of Load/Store. Overall this is a good directly as we unify more analysis onto the (possibly multi-dimensional) buffer access, opens doors for special memory unit that are in nature multi-dimensional.
However, we do need more clarifications and systematic review/cleanup (see check list).
Check List
preflatten_buffer_map
(possible via buffer redeclaration)address_of
, ideally to 1D acesscc @Hzfengsy @junrushao