Open jinevening opened 3 weeks ago
To anyone interested in this issue: I've heard that SSAFY(Samsung Software Academy For Youth) mentors (@shs-park and @zetwhite) are interested in the second milestone of this issue. Please do not assign yourself to the second milestone.
I assigned myself to batchmatmul
to understand this issue well.
Before implementing shape inference logic for each operator, we have to answer the below question.
Can we assume unknown dimension is valid?
Let's consider a concat operator with two inputs.
[1,4,3,1]
[1,4,?,1]
If axis
of concat is 2, we can simply say that the output shape will be [1,4,?,1]
.
But if axis
of concat is 1, what would be the result? I think there are three choices.
(1) [1,8,3,1]
(2) [1,8,?,1]
(3) Stop shape inference
(1) assumes ?
is 3, which is the only valid value according to the semantics of concat.
(2) just propagates the unknown dimension to the same dimension of the output.
(3) seems too much to me. Actually we can guess shape of the output.
I'd like to go with (1), because a proper backend (which can see the actual value of ?
) should raise an error for the wrong input. For example, if a user gives a wrong-shaped tensor (?
!= 3), an error message will be printed like "non-axis dimension of all inputs must be the same". So, the only possible shape of the output is [1,8,3,1]
.
@Samsung/one_compiler Please leave comments about this issue if you have any other suggestions.
@mhs4670go I found that xla also goes with the approach (1). PTAL c1
in https://github.com/tensorflow/tensorflow/blob/v2.17.0/third_party/xla/xla/service/shape_inference.cc#L280-L292
Please leave comments about this issue if you have any other suggestions.
I'm OK with @jinevening 's conclusion.
Maybe @Samsung/one_onert can give how it is treated in runtime?
I've heard that shape inference is done after an actual input is given. @hseok-oh @ragmani Or, does onert have inference logic for dynamic shape?
(1) [1,8,3,1] (2) [1,8,?,1]
Have you checked torch or tensorflow for this case?
I found that xla also goes with the approach (1)
XLA is a compiler for tensorflow. I didn't check torch.
@jinevening
I've heard that shape inference is done after an actual input is given. @hseok-oh @ragmani Or, does onert have inference logic for dynamic shape?
Yes, onert has inference logic that infers shapes of outputs of nodes from first dynamic tensor sequentially. An actual input can be a candidate for the first dynamic tensor.
FYI, here is the tflite conversion example.
import tensorflow as tf
x1 = tf.keras.Input(shape=(2,None))
x2 = tf.keras.Input(shape=(2,3))
concat = tf.keras.layers.Concatenate(axis=1)([x1, x2])
model = tf.keras.models.Model(inputs=[x1, x2], outputs=concat)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
Operands: T(subgraph index : tensor index) TYPE (shape) (shape_signature) B(buffer index) (variable) OperandName
T(0:0) FLOAT32 (1, 2, 1) (-1, 2, -1) B(1) serving_default_input_1:0
T(0:1) FLOAT32 (1, 2, 3) (-1, 2, 3) B(2) serving_default_input_2:0
T(0:2) FLOAT32 (1, 4, 3) (-1, 4, 3) B(3) PartitionedCall:0
@jinevening @seanshpark
Please leave comments about this issue if you have any other suggestions.
I'm OK with @jinevening 's conclusion.
Maybe @Samsung/one_onert can give how it is treated in runtime?
~In your concat example, onert thorws an exception.~ onert supports concat op with dynamic shape only when all dimensions are equal except for concat axis dimension. https://github.com/Samsung/ONE/blob/8188194d4f4ba135227370ea260400fee0c206e9/runtime/onert/core/src/util/ShapeInference.cc#L312
However, I'm not sure if compiler treats it such as onert. In onert, all shapes of node inputs have already been determined(static) when inferencing outputs for each node. So, onert can support the above concat example if an input has unknown dimension. If the inferred dimension is equal, onert works well. If it is not equal, onert throws an exception.
from @mhs4670go 's example, and a little bit of change in @jinevening 's case,
two inputs with Concat index of 1,
[?,4,3,1]
[?,4,?,1]
this becomes
[?,8,3,1]
@mhs4670go , am I correct?
@seanshpark Yes. That's right.
onert
. It is about luci
.
Why luci
is going to support dynamic-shaped LLM inference?
luci
's inferred shape used to generate circle
?
(I am guessing luci
is used to determine shape in generating circle) I talked with @jinevening offline.
By to support dynamic-shaped LLM inference
, @jinevening means
① Generating circle (FrontEnd) + ② Executing circle (Runtime)
At first, I considered LLM inference
means circle execution, and luci
has luci-interpreter
.
Thus, I thought this issue aims to run circle model using luci-interpreter
.
In short,
luci
(not luci-interpreter
) is needed to generate circle.
onert
will run the generated circle.
What
Let's support shape inference for operators whose inputs have dynamic shape.
I've made a list of operators to support dynamic-shaped LLM inference.
First milestone (for token gen model)
Second milestone (for the whole (prompt parsing, token gen) model)
Others (for any other issues)
Why
Emerging models (ex: LLM) require dynamic shape tensors (mainly to deal with varying sequence length). But, the current shape inference logic for circle does not correctly handle dynamic shape. Let's extend existing shape inference rules to handle dynamic shape tensors.