[Unity][Tracking Issue] In-place operations

slyubomirsky commented 1 year ago

Per the discussion on in-place updates, this is a tracking issue to discuss the steps and implementation details.

[x] Implement a basic version of the call_tir_inplace operator. This will handle the "simple case" described in the discussion thread, where the input tensor must be at least large enough to hold the desired output. At this stage, we will not handle memory planning for the cases where the input tensor is too small to hold the output (thus requiring the memory planner to ensure that the underlying storage is large enough).
[x] Implement a conservative alias analysis to ensure that candidates for in-place operations are not aliased. We may also need to implement special operators to assert that values are not aliased or assert that a given PackedCall does not create aliases.
[x] Implement a liveness analysis to determine that candidates for an in-place operation are not live later in the program.
[x] Implement a pass that uses the alias and liveness analyses to identify candidates for in-place operations and replaces them with call_tir_inplace invocations. At this stage, the memory planner should also be modified to handle cases where the input tensor needs a larger underlying storage.

cc @quic-sanirudh

slyubomirsky commented 1 year ago

@tqchen A question about implementing one of the relatively simple cases, an in-place operation where the result is smaller than the input. I discussed with @MasterJH5574 and we weren't entirely sure about how this should work.

Let's suppose we have a call out = call_tir_inplace(some_func, (t1,), return_shape), where return_shape is smaller (in at least one dimension) than t1. out will be treated as a tensor of the smaller shape (return_shape), but will in reality be stored in the same place as t1. @MasterJH5574 points out that slicing is an example where this could arise.

The question is, do we need to have any special handling in the memory planner for this case? Would Relax's runtime treat out as being of t1's shape even though it's supposed to be smaller? Where might we need to make changes to handle this case? I could imagine some difficulties potentially arising with stride, layout, etc.

(For now, I will implement the scenario where the output shape matches the input shape exactly.)

slyubomirsky commented 11 months ago

Alternative approach suggested by @tqchen and @psrivas2: Consider dataflow blocks only. This would have the advantage of avoiding a whole-program analysis for liveness and aliases and would be a large simplification due to not having to handle control flow, but the risk would be that the alias analysis would have to be overly conservative since any value that comes from outside the dataflow block would have to be treated as potentially an alias. This is worth trying on a real example (say, an excerpt from an LLM). If these very conservative versions of the analysis turn out to be sufficient, then that would be a reasonable starting point.

slyubomirsky commented 7 months ago

The original proposal has been implemented in #16129, though focusing mainly on dataflow blocks instead. I am also now working on an addition to have special handling for split and concat (when these are eligible to do in-place, they can be implemented as "no-ops" just by taking views of the underlying storage).

We can use the more complex approach of #15689 to implement a more general version that does not require dataflow blocks.

apache / tvm

[Unity][Tracking Issue] In-place operations #15319