apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.6k stars 3.44k forks source link

[Tracking Issue] BufferAccess Migration Followup Items #10505

Open tqchen opened 2 years ago

tqchen commented 2 years ago

This is a tracking issue created for followup tracking items in https://github.com/apache/tvm/pull/9727.

The particular PR contains an important new feature(transform_layout). The main goal of this issue is to do followup shepherding around the IR change part(buffer access and related allocations), which have a broader impact. The particular PR takes many rounds of reviews and but still have quite a few points that would benefit from systematic clarification, cleanup and further polish.

Given that there is an important followup(of layout transform) that depend on the change, and the PR is already at a state where incremental changes can further block the progress as other concurrent changes are made. We took the deliberation to merge PR, then continue to do post-hoc shepherding. This decision is supposed to happen in rare circumstance with clear followup actions to backup the decisions.

This is the tracking issue for followup shepherd actions

High-level gist of change

Load/Store are deprecated and changes to BufferLoad/Store of 1D array. BufferLoad/Store of 1D elements are being used in the place of Load/Store. Overall this is a good directly as we unify more analysis onto the (possibly multi-dimensional) buffer access, opens doors for special memory unit that are in nature multi-dimensional.

However, we do need more clarifications and systematic review/cleanup (see check list).

Check List

cc @Hzfengsy @junrushao

tqchen commented 2 years ago

cc @vinx13 @junrushao1994 @masahi @Lunderberg

masahi commented 2 years ago

Can we remove Load / Store node altogether now?

tqchen commented 2 years ago

Let us first finish the items, streamline changes and likely remove load/store in next release. So we have sufficent timewindow and error message for people to react

Lunderberg commented 2 years ago

Discuss and clarify the scope of vector buffer indexing rule

I agree with adding additional restrictions. I've created draft PRs https://github.com/apache/tvm/pull/10512 and https://github.com/apache/tvm/pull/10513, to test the impact of different rules on existing test cases. Each requires that only a single index can have multiple lanes, while #10513 additional requires that to be the last index.

@tqchen After thinking on your comment here, I agree that if there are no issues introduced by using the stricter restriction, we should use it, as we can also loosen the condition later if needed.

wrongtest-intellif commented 2 years ago

Thanks for the great efforts!

Here is my question: Whether we will have some SSA (or no-alias?) property on Buffer object in low-level tir? Previously we can treat every buffer var uniquely defined by Allocate and used by Load/Store/Opaque accesses. Now there seems to be some unclearness. For example, what if we create multiple buffers on the same buffer var associated by Allocate stmt?

tqchen commented 2 years ago

@wrongtest great point. I agree it is important to summarize the aliasing property. @vinx13 perhaps can add more comments on this

vinx13 commented 2 years ago

While tir.noalias property still hold for PrimFunc args, T.decl_buffer creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. Buffer created via alloc_buffer always do not alias. Buffer aliases do not need Allocate to create the data variable. Each data variable can be allocated via Allocate once. If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop that contains an allocation), the transform should update the allocations to be unique using tvm::tir::ConvertSSA

Buffers should not alias each other unless necessary, because aliased buffers increase complexity for TIR transformations. Passes that rewrite buffers should clearly indicate how aliased buffers are handled. For example, when changing the underlying layout of stored elements in a buffer, all buffer aliases must also be updated. While buffer aliasing is typically free at runtime, this imposes a cost for buffer aliasing both to compile times and development complexity.

We should also extend VeryfySSA pass to also take buffer into consideration.

Lunderberg commented 2 years ago

Both PRs for testing indexing restrictions, https://github.com/apache/tvm/pull/10512 and https://github.com/apache/tvm/pull/10513, have passed CI. I think I agree with the tighter restriction, that only the last index may have more than one lane. While there are cases where a ramp index in an earlier lane could be transformed into a ramp index in the last lane, those would be relatively rare, as it would require a ramp index to be present in user-defined calls.

I have closed #10512, and converted #10513 from a draft PR to an open PR.

tqchen commented 2 years ago

Thanks @Lunderberg , if @vinx13 you also agree, can you review and merge https://github.com/apache/tvm/pull/10513?

wrongtest-intellif commented 2 years ago

Since Buffer object takes an elem_offset field (be zero by default), perhaps we should also process it in CodeGen. It could be useful if we want to create aliased buffers to sub-regions. The proposed changes are here: https://github.com/apache/tvm/pull/10582

Lunderberg commented 2 years ago

Also related to elem_offset, I think it will need to allow multiple independent offsets. When a tensorized buffer is lowered, the elem_offset is used to store the flattened index. For non-flat memory buffers, this requires multiple indices, but only one can be stored there. I think both elem_offset and offset_factor will require one parameter per physical dimension, and will need to be changed to arrays.

I have the proposed changes in #10816 , which should maintain backwards compatibility on the python API, and for constructor calls to Buffer on the C++ API.

tqchen commented 2 years ago

Thanks @Lunderberg , it would be great to deliberate on the choice of elem_offset, to see if we have other alternatives, as that would also mean a major change.

Lunderberg commented 2 years ago

Following some debugging with @areusch on an issue caused by preflattened_buffer_map, I did some experimenting on how hard it would be to remove it altogether. #10940 is a proof of concept for keeping the preflattened shape in the buffer_map, while the function body refers only to flattened aliases of the user-provided buffers. It looks like it is working pretty well, and intend to open an RFC for further discussion on it in the near future.

tqchen commented 2 years ago

@Lunderberg this also relates to https://github.com/apache/tvm-rfcs/pull/63/ where we can use buffer declaration to create alias, I think this should be the right way do go. Additionally, it would be great to make buffer declaration(alias creation) explicit

Lunderberg commented 2 years ago

Thank you, and I feel silly for forgetting RFC#63 after the draft discussions, and definitely agreed on the explicit aliasing being a better option for the long term.

Lunderberg commented 1 year ago

As part of the follow-up (and as part of RFC#70), PR #10940 has now landed, removing the pre-flattened buffer map. The PrimFuncNode::buffer_map now contains the original pre-flattened buffer shape through all TIR lowering steps, and flattened buffers are represented as aliases into the argument buffer.

(And updated the tracking issue's checklist.)