Make rule for multiple calling of nnfw_set_input_tensorinfo() and nnfw_run()

hyunsik-yoon commented 4 years ago

Rule for multiple calling of nnfw_set_input_tensorinfo() and nnfw_run() is not clear. How about the following rule?

/cc @Samsung/nnfw

Rule 1) nnfw_set_input_tensorinfo() will always set its input tensor dynamic (see case A) and B) in the example below)

Rule 2) Once nnfw_set_input_tensorinfo(new_shape) is called, new_shape will be used for the successive calls of nnfw_run() until another nnfw_set_input_tensorinfo() is called. (see case C) and D))

For example: assuming a model has an input of [1, 128]

nnfw_prepare();
nnfw_run();  // static input tensor input with [1, 128] 

nnfw_set_input_tensorinfo([1, 128]); // app developer shows and intention to change the shape
nnfw_run();  // *dynamic* input tensor with [1, 128] --> case A)

nnfw_set_input_tensorinfo([5, 128]);
nnfw_run();  // *dynamic* input tensor with [5, 128] --> case B)

// once nnfw_set_input_tensorinfo([5, 128]); is called for the previous nnfw_run()
// let the next nnfw_run() also use [5, 128]
nnfw_run();  // *dynamic* input tensor with [5, 128] --> case C)
nnfw_run();  // same with case C)

nnfw_set_input_tensorinfo([1, 128]);
nnfw_run();  // *dynamic* input tensor with [1, 128] --> case D)

FAQ)

Can we run a model with static input tensor in case of case A) or D)?
- Consider the following example, (#0 means tensor with index 0)
```
#0 = input 0 of shape [1]
#1 = input 1 of shape [2]
#2 = relu(#0)
#3 = reshape(const of shape [8] , #1)
#4 = add(#2, #3)
```
- after nnfw_prepare(), #3 and #4 become dynamic and others are static --> state A)
- after calling nnfw_set_input_tensorinfo(#0, [2,2]) and nnfw_run(), all tensors become dynamic
- then, what if we call nnfw_set_input_tensorinfo(#0, [1]) and nnfw_run() again to run with static input tensor?
  - since all tensors became dynamic, we have to move back to the state A)
  - this means that we have to store state after nnfw_prepare(): code will be complex and more memory is required
  - However, if we run it with dynamic input tensor, code will be simpler and does not require the memory

I will start fixing code for this rule after listening to opinions.

krayzemli commented 4 years ago

In MNN, dynamic shapes are not supported, and changing input triggers update: tensor shapes are reinferenced. tensor and internal memory is reallocated, and related internal preprocessing of operators is redone. Hence, it is possible to do the same in ONE when only input shapes are changed, and all internal shapes may be reinferred statically. In MNN, operator configure is two-step, first step done when building graph and second step is done after the shape inference. In ONE, configuration now is single-step, this step is executed after shape inference, but before the operator sequence is built. To reduce reconfiguration overhead, two-step configuration would also be necessary.

glistening commented 4 years ago

@hyunsik-yoon I like simple rule or no rule. However considering some model like your target, for performance, rule 1 and 2 looks reasonable. However, there may be several cases:

1) Some model may remain static though it is called with nnfw_set_input_tensorinfo(). 2) Some app would like to go back to static. 3) nnfw_set_input_tensorinfo() may be used to set other information of tensor though currently it is used to set shape.

It would be good we can remove performance penalty of static input by nnfw_set_input_tensorinfo() for those cases.

wateret commented 4 years ago

@krayzemli

In MNN, dynamic shapes are not supported, and changing input triggers update: tensor shapes are reinferenced. tensor and internal memory is reallocated, and related internal preprocessing of operators is redone. Hence, it is possible to do the same in ONE when only input shapes are changed, and all internal shapes may be reinferred statically.

If nnfw_set_input_tensorinfo is called before nnfw_prepare, we can change the given model's input shapes. Please read comments in nnfw.h for details. However it is not exactly the same with MNN as we do not allow calling nnfw_prepare more than once for a session.

BTW I would like to clarify that we have 3 situations:

Static input, static model (all static, all good)
Changing input shape makes the model static (described above)
The model is going to be dynamic anyway (described below)

About 3rd case, some models require some tensors to be dynamic. For example, when Reshape op gets "shape" tensor as its input which is non-const. So I think re-configuration would not resolve everything. And that's one of the reasons why we allow calling nnfw_set_input_tensorinfo after nnfw_prepare - it is going to be dynamic anyway, so for those models are going to change their tensor shapes for every run. When we first introduced dynamic tensors, we had to consider all those cases so cpu backend is changed to be general, which introduced performance issues we are facing.

In MNN, operator configure is two-step, first step done when building graph and second step is done after the shape inference.

Could you please elaborate what it does for each step?

In ONE, configuration now is single-step, this step is executed after shape inference, but before the operator sequence is built. To reduce reconfiguration overhead, two-step configuration would also be necessary.

Please note that we also have ParallelExecutor which does not fix execution order during compilation but just run the graph in any topological order. I don't know how MNN's two-step configuration works so this is pretty much all that I can say for now.

wateret commented 4 years ago

@hyunsik-yoon

The rules look good to me. It is consistent with other APIs, for example, the buffer that is used for nnfw_set_input is used for the next run(unless user called nnfw_set_input before the next run).

hyunsik-yoon commented 4 years ago

@glistening, for https://github.com/Samsung/ONE/issues/4625#issuecomment-708717267

However, there may be several cases:

1) Some model may remain static though it is called with nnfw_set_input_tensorinfo(). 2) Some app would like to go back to static.

For example, assuming an input is [batch, 100], tflite file will have input shape [1, 100]. we can consider such case. If increasing batch by 1 needs 100ms in case of dynamic input tensor,

batch size	batch = 1	batch = 2	batch = 3	...	batch = n
latency	- when with static input tensor less than 100ms, say 80ms - when with dynamic input tensor 100ms	200ms	300ms	...	n * 100 ms

So if we infer the model with well distributed batch size, 80 ms (static) or 100 ms (dynamic) when batch == 1 may not be that important.

However, if a model is used mostly with batch == 1 and sometimes batch > 1, considering what you mention would be important. Current question is... whether we have such requirement. FYI, for the model I am working on, I am fine with all-dynamic model. input batch size is rarely 1.

3) nnfw_set_input_tensorinfo() may be used to set other information of tensor though currently it is used to reset shape.

You mean, changing only input dtype without changing shape? Good point to consider. I found this (https://developer.android.com/ndk/reference/group/neural-networks?hl=ko#aneuralnetworksexecution_setinput)

Unless the input is omitted, this should be used to specify the dimensions that were left
unspecified when the operand was added to the model. All other properties of the type
must be the same as specified in the model. If the type is the same as specified when
the model was built, NULL can be passed.

So I guess we will be fine for some time being. :-)

krayzemli commented 4 years ago

In MNN, different operators are not allocated between threads, instead each operator uses OpenMP to utilize all threads. In ONE cpu backend, not all (or probably even none) operators support data-based parallelization, so running networks in batch mode in ONE cpu backend look like a doubtful idea anyway.

When using MNN-based applications, I preferred creating several single-thread contexts for the same network and run them in parallel with batch=1, to reduce latency and avoid batch-managing logic, even though it was not memory-effective (still, in theory, it is possible to implement special logic for that in ONE).

Collecting batches is more necessary when utilizing GPUs, which are not always good on small batches. In batch mode, the batch size is typically varied randomly between 1 and some maximum value, depending on some external activity, i.e. batches are collected within some timeout and sent to inference once collecting timeout is exceeded or maximum batch size is reached. In this case, we could in theory allow user to set maximum batch size, so we could allocate tensor memory only once, but that is all the benefit we get,shape inference, etc. will have to be redone anyway. Besides, reconfiguration overhead is not typically expected to be so important in batch mode.

Dynamic models are unlikely to be used in batch mode at all, except when they have been specially designed for that, because a tensor size has to be the same for the whole batch. Since dynamic part of the model may terminate on some constant shape output operator like Reduce, the whole downstream could again be inferred statically, I guess such case is not supported in ONE for now, but instead, once any dynamic tensor is met, all the downstream tensors up to the very end become dynamic.

Since changing tensor size requires some memory-management and shape-inference overhead anyway, it becomes not so important, whether it is done statically or dynamically. So, rerunning static memory allocation and shape inference on input change instead of marking everything dynamic would be good idea, besides, static results may be reused between runs, while dynamic inference is rerun every time. And, I guess, it is not so impossible to change ONE logic to two-step configuration.

Considering again #4544, I think abstract IPortableTensor as input to all cpu backend operators is not good idea. Every operator uses expensive virtual calls to re-read shapes and buffer addresses only to access this buffers directly afterwards without utilizing any abstraction. Since each tensor is typically used in at least two operations, both shapes and buffer addresses of all the tensors could be cached in some backend-specific tensor structure (with a link to corresponding IPortableTensor if necessary) at the very start of the run and then only this non-abstract structure be used in cpu backend operators.

About dynamic data type, I think now cpu backend operators do not support this, as they often read data in configure to do type-specific configuration.

In MNN, two-phase configuration consists of operator constructor run when building graph, and OnResize method that configures the network to specific input shapes, including reallocation of internal buffers. Since MNN is static-shaped, all OnResize methods are executed during shape inference and memory allocation phase. For example, non-quantized binary ops are implemented in CPUBinary.cpp. During first phase, CPUBinaryCreator::onCreate creates either CPUBinaryInt or CPUBinaryFloat object, depending on the data type, and in the second phase, corresponding CPUBinaryInt::onResize or CPUBinaryFloat::onResize, does some shape preprocessing (like we have ProcessBroadcastShape).

glistening commented 4 years ago

@hyunsik-yoon

However, if a model is used mostly with batch == 1 and sometimes batch > 1, considering what you mention would be important. Current question is... whether we have such requirement. FYI, for the model I am working on, I am fine with all-dynamic model. input batch size is rarely 1.

I am thinking of last year model's requirement you've told me. Resizing the input size (row and col), not batch size.

You mean, changing only input dtype without changing shape? Good point to consider. I found this (https://developer.android.com/ndk/reference/group/neural-networks?hl=ko#aneuralnetworksexecution_setinput)

Your reference seems nnapi specific. We may have same limitation on our nnfw API.

hyunsik-yoon commented 4 years ago

@glistening

I am thinking of last year model's requirement you've told me. Resizing the input size (row and col), not batch size.

I am not sure exactly what's your concern but I guess you're mentioning a model of which input in tflite file is, e.g., [1, 128, 128] but its actual input is, e.g., [1, 512, 512]? (some model we mentioned for input reshaping (or input resizing)?) Note that there could be two calling sequences:

1) compile model with [1, 512, 512] (at execution time, input tensor is static) nnfw_set_input_tensorinfo([1, 512, 512]) nnfw_prepare() nnfw_run() .. nnfw_run()

2) change shape to [1, 512, 512] after compilation time, batch == 1 but other dims are different, making input tensor dynamic nnfw_prepare() nnfw_set_input_tensorinfo([1, 512, 512]) nnfw_run() .. nnfw_run()

Your reference seems nnapi specific. We may have same limitation on our nnfw API.

Right. I referred to nnapi just a reference of other framework. We haven't discussed if we can change dtype dynamically. Also our friend nnapi also assumes that dtype cannot be changed. So, for now how about only considering shape?

glistening commented 4 years ago

@hyunsik-yoon

I guess you're mentioning a model of which input in tflite file is, e.g., [1, 128, 128] but its actual input is, e.g., [1, 512, 512]?

Right. If it has been working as dynamic_shape after resizing input, it is okay.

how about only considering shape?

Yes, it looks good.

hyunsik-yoon commented 4 years ago

For this year, how about finalizing rule as mentioned in https://github.com/Samsung/ONE/issues/4625#issue-721225511? My feeling is that https://github.com/Samsung/ONE/issues/4625#issue-721225511 is short term fix but what @krayzemli and @wateret discussed is more into core part, which requires core changes. If we need more discussion on this, I think we'd better discussion long-term && core stuff separating from this issue. (The show must go on! :-)

krayzemli commented 4 years ago

I still have not got what this discussion is all about.

Right now, nnfw_prepare() may be called only once, and it does some internal processing, including shape inference. Once nnfw_set_input_tensorinfo() is called before nnfw_prepare(), we could in theory take advantage of that by not only running shape inference in advance, but do some shape-dependent model optimization during compile step.

Once nnfw_set_input_tensorinfo() is called after nnfw_prepare(), we now just mark all downstream tensors as dynamic and switch them to dynamic shape inference. This is simple approach, but it works. Since static and dynamic tensors use different memory allocation strategy, switching a static tensor to dynamic mode may increase memory usage. Resetting nnfw_set_input_tensorinfo() settings after run() is a bad idea. We could also introduce some premature optimization like memorizing and reusing old static shapes when input again matches the one model was compiled with, or we could reuse statically allocated memory when dynamic tensor becomes smaller that static one, but I see all these as a doubtful idea, that would take work to do, make code more complex and do not bring much improvement.

Yes, dynamic tensors require somewhat more runtime managing overhead, than static. But in principle, both dynamic and static tensor management do same things: shape inference and memory management. And indeed, static memory management may be more effective, and it depends on statically derived shapes, which is possible to do when only inputs change.

Dynamic approach is good for networks which are dependent on it. But in most practical cases they do not (or do, but not much). And so, rerunning static management on changing the input size would be the best approach.

So what I propose is: redesign runtime to make possible redoing static shape/memory management between runs when an input shape changes. I think this will bring most benefit in most practical cases.

When user wants to alternate within a fixed set of input shapes, and shape inference/memory management step becomes a performance stopper for him, he can just create several sessions at the same time, one for each input shape.

hyunsik-yoon commented 4 years ago

@krayzemli,

I agree with what you mentioned. Also I definitely agree that we always have to enhance ONERT with better design. There's some situation regarding this issue. We have some requests to support the rule I mentioned in the first comment. (stuff like case A).. case B).. . ONERT currently crashes.) and this should be done in short time. So, we have to make ONERT run with such cases ASAP with a PR like #4718.

Meanwhile, we can also initiate discussion about better design, which will take more time in discussion and implementation. So IMHO, we'd better discuss the opinion about new design in a separate issue since it has different time-frame. How about creating an issue for your idea?

krayzemli commented 4 years ago

@hyunsik-yoon

What I see there is a bug (wrong comparison), that makes setting the input shape back to original impossible. Indeed this bug needs to be fixed, but this discussion topic is definitely not about this bug, but about something more global.

krayzemli commented 4 years ago

Repeating my opinion from https://github.com/Samsung/ONE/pull/4718#issuecomment-715230581

Using the fact of resizing an input as an explicit signal of making tensor dynamic and trigger dynamic shape inference is a bad idea. Although I don't see a reason for now of why the user may want to turn on dynamic inferring explicitly, but even if such a reason existed, this switching should be accomplished by an additional flag parameter or a separate API function, rather than by just the fact of resizing an input. Teaching a user to rely on such a fact may introduce backward-compatibility issues in the future.

Now it is just our implementation-related problem that resizing an input activates less effective dynamic mode, and this may change in the future. To avoid involuntary switching on dynamic mode when not necessary (say, user application gets batch size from some external source after the model is aready prepared) the user will have to introduce explicit shape comparison in his code that will make the code more complex. Also this optimization logic in user code will turn out obsolete if we change our input resizing implementation.

So I think such an optimization would be more appropriate in ONE code, rather than in user code. And so, keeping the static mode when possible is the best choice.

hyunsik-yoon commented 4 years ago

In short term, we have our current version of API. We have to make this API work. If onert crashes, we have to fix. (#4718) In mid-term, we need to improve the API to the next version. What you propose (additional flag parameter or a separate API function, or making onert handles optimization logic) may change API. We may need to deprecate existing API. So it takes time to discuss such decision.

Let's not mix short and mix terms together. Good idea for mid-term work is always welcome but that does not mean that we should let onert crash without fixing existing bug or spending long time to fix it.

I would like to propose to discuss short term fix in #4718 and longer term topic here.

krayzemli commented 4 years ago

@hyunsik-yoon Indeed, fixing the bug is essential, I'll discuss the bugfix in #4718 . I'm just against standardizing side effects of API calls at the moment, like using resizer API call to signal something implementation-specific.

wateret commented 4 years ago

% FYI I did not fully followed this up, I continued reading from https://github.com/Samsung/ONE/issues/4625#issuecomment-715239728 .

I'm just against standardizing side effects of API calls at the moment, like using resizer API call to signal something implementation-specific.

@krayzemli I respectfully disagree.

The thing with static/dynamic tensors are black box for users.

When we first decided introducing nnfw_set_tensorinfo(or nnfw_apply_tensorinfo), we noticed that users do not care about dynamic tensors. Even some of them do not even know what that is. The reason why we distinguish dynamic/static tensors are just for optimization - performance and memory, not something functional. So there is no side effect.

As implied from @hyunsik-yoon , "running correctly" is the first thing we care. So I think he is taking the optimization back that caused problems first. I think eventually it will work great when the runtime gets smart enough.

krayzemli commented 4 years ago

@wateret

The problem of PR #3998 is just the comparison in the wrong place. PR #4748 is the way I would fix the problem without removing the optimization.

So there is no side effect.

Now looking at the start of this thread, I see the following:

nnfw_prepare();
nnfw_run();  // static input tensor input with [1, 128] 

nnfw_set_input_tensorinfo([1, 128]); // app developer shows and intention to change the shape
nnfw_run();  // *dynamic* input tensor with [1, 128] --> case A)

And that's exactly the standardizing of a side effect: signaling some 'intention' with nnfw_set_input_tensorinfo.

glistening commented 4 years ago

@hyunsik-yoon ... I'm just against standardizing side effects of API calls at the moment, like using resizer API call to signal something implementation-specific.

Good point. I agree. However, we also need to inform the user that there are performance degradation after nnfw_set_input_tensorinfo() because the user does not expect or guess. We may change the rule like there may be performance penalty by calling nnfw_set_input_tensorinfo() in API reference.

hyunsik-yoon commented 4 years ago

@wateret @krayzemli , besides what @glistening suggested, any other suggestion on the current version of API ?

wateret commented 4 years ago

@krayzemli I am still not sure if I got you right, 😅

About "side effect", it looks like we understand this word differently. I see your way minimizes redundant shape inferences, and it looks nice. But I still think it is just about whether optimization is applied or not. Say, modern javascript engines have tiered JITC then how user writes code would have "side effect" and i think our situation is just like that.

What I understood is that @hyunsik-yoon tries to fix the crash first as applying the optimization(MNN way?) looks not so trivial. Are you saying that we have a simple fix(#4748) without disabling optimization, but why would we detour?

Or are you insisting that we need better API? Then could you please elaborate a little?

krayzemli commented 4 years ago

@wateret I think Rule 1 is bad. nnfw_set_input_tensorinfo implementation should be allowed to decide itself whether to use static or dynamic mode with a new given shape (no matter if we fix the optimization or remove it, though I'd prefer to keep it). When a user wants to set dynamic mode explicitly, he should call some separate (not yet present) API function, something like nnfw_set_input_tensor_attribute(input_num, NNFW_ATTRIBUTE_ALWAYS_DYNAMIC, true). Without the attribute, implementation should not guarantee which mode will be used.

We may change the rule like there may be performance penalty by callingnnfw_set_input_tensorinfo()` in API reference.

That would be better. I'd tell something like: changing the tensor shape before nnfw_prepare may give better performance and memory footprint of the network compared to changing the shape after nnfw_prepare and between runs.

glistening commented 4 years ago

@wateret I think Rule 1 is bad. nnfw_set_input_tensorinfo implementation should be allowed to decide itself whether to use static or dynamic mode with a new given shape (no matter if we fix the optimization or remove it, though I'd prefer to keep it). When a user wants to set dynamic mode explicitly, he should call some separate (not yet present) API function, something like nnfw_set_input_tensor_attribute(input_num, NNFW_ATTRIBUTE_ALWAYS_DYNAMIC, true). Without the attribute, implementation should not guarantee which mode will be used.

We may change the rule like there may be performance penalty by callingnnfw_set_input_tensorinfo()` in API reference.

That would be better. I'd tell something like: changing the tensor shape before nnfw_prepare may give better performance and memory footprint of the network compared to changing the shape after nnfw_prepare and between runs.

@krayzemli yes, it is what I understood by "side effect" you mentioned. I prefer this one.

@hyunsik-yoon, @wateret Do we have some code or reason that we must assumes rule 1?

hyunsik-yoon commented 4 years ago

I would like to clarify again for people since there are many talks in this discussion and tough to follow:

Current implementation and what problem we have (ONE runtime crashes):

Please refer to FAQ section of the first comment (https://github.com/Samsung/ONE/issues/4625#issue-721225511)

Short term solution I propose (without API modification):

Rule 1) and Rule 2) of the first comment (https://github.com/Samsung/ONE/issues/4625#issue-721225511)
PR #4718 (open)

Mid-long term solution (API modification could be considered):

@krayzemli, @wateret et al. are in discussion in the above

@glistening,

@hyunsik-yoon, @wateret Do we have some code or reason that we must assumes rule 1?

Please refer to the first comment in this issue. :-)

glistening commented 4 years ago

@hyunsik-yoon Could you please point out the statement? I read the first comment. But I cannot find the answer.

hyunsik-yoon commented 4 years ago

@glistening Maybe https://github.com/Samsung/ONE/issues/4625#issuecomment-708895430 would be more helpful.

@glistening @krayzemli

We may change the rule like there may be performance penalty by calling nnfw_set_input_tensorinfo()` in API reference.

That would be better. I'd tell something like: changing the tensor shape before nnfw_prepare may give better performance and > memory footprint of the network compared to changing the shape after nnfw_prepare and between runs.

To communicate with users, I agree.

@krayzemli Just for information. If you're interested in designing and writing the idea you proposed in this issue, you might want to write a draft.

wateret commented 4 years ago

When a user wants to set dynamic mode explicitly, he should call some separate (not yet present) API function, something like nnfw_set_input_tensor_attribute(input_num, NNFW_ATTRIBUTE_ALWAYS_DYNAMIC, true). Without the attribute, implementation should not guarantee which mode will be used.

@krayzemli I am against introducing this API since as I said before:

Most users don't care about dynamic/static tensors
We are fixing the crash first quickly, then we will do the way you want
So that means when onert gets mature enough, there is no need for that API since it will automatically run properly(use static tensors as much as possible, which is almost always superior)

We may change the rule like there may be performance penalty by callingnnfw_set_input_tensorinfo()` in API reference.

That would be better. I'd tell something like: changing the tensor shape before nnfw_prepare may give better performance and memory footprint of the network compared to changing the shape after nnfw_prepare and between runs.

I also agree. Actually I was not aware that the document says something about static/dynamic tensors. That's definitely my bad as a maintainer of NNFW API. That may have caused all the misunderstanding between us.

So if we remove dynamic/static tensor stuff in the comment, would everything be alright? And putting the words @glistening suggested("there may be performance penalty by calling nnfw_set_input_tensorinfo()") just looks good to me.

krayzemli commented 4 years ago

@wateret

It is not so bad that API mentions dynamic and static modes, since there is a fundamental difference between them: in the dynamic mode, an output shape can't be determined in advance, i.e. user can't rely on querying the shape from nnfw_output_tensorinfo before calling nnfw_run to calculate the necessary buffer size to pass to 'nnfw_set_output'. For now, API gives no mechanism for automatic output tensor allocation, so the user has to guess the buffer size himself somehow. This is not a big problem when only a batch size is varied, since tensor size may be calculated by querying the default output shape after nnfw_prepare and then scaling the size to the batch needed. However, this may not work on dynamic models, whose shapes may be dependent on the input, i.e. can't be inferenced statically. Even though nnfw_set_input_tensorinfo description states that nnfw_prepare will perform static shape inference for all tensors, it does not warn that static inference is not guaranteed to be successful for all tensors. And user even can't query whether it has been successful or not.

TL;DR: User must use nnfw_set_output to provide an output buffer of the size greater or equal than the operand requires, but there is no reliable mechanism to determine that size in all cases as the model may be already in the dynamic mode right after nnfw_prepare even if nnfw_set_input_tensorinfo has not been called at all. I see this as a possible problem in the API.

glistening commented 4 years ago

@krayzemli Thank you for several suggestions. I am afraid that discussion in this issue seemingly went too further from the subject of this issue's creation time. We have to solve the confronting issue. What about creating new issues for your suggestion?

@wateret, @hyunsik-yoon

My suggestion:

Rule 1 - It will be internal ~rule~ developer's rule to provide quick fix for confronting issue which will be effective till we have better solution. dynamic/static will not be mentioned in API reference.

Rule 2 - It seems okay both in internal developer and API reference.

hyunsik-yoon commented 4 years ago

@krayzemli

Even though nnfw_set_input_tensorinfo description states that nnfw_prepare will perform static shape inference for all tensors, it does not warn that static inference is not guaranteed to be successful for all tensors. And user even can't query whether it has been successful or not.

I agree.

@glistening

Rule 1 - It will be internal rule developer's rule to provide quick fix for confronting issue which will be effective till we have better solution. dynamic/static will not be mentioned in API reference. Rule 2 - It seems okay both in internal developer and API reference.

I agree (with the current version of API).

wateret commented 4 years ago

TL;DR: User must use nnfw_set_output to provide an output buffer of the size greater or equal than the operand requires, but there is no reliable mechanism to determine that size in all cases as the model may be already in the dynamic mode right after nnfw_prepare even if nnfw_set_input_tensorinfo has not been called at all. I see this as a possible problem in the API.

Actually I think this is WAY BEYOND the scope of the discussion here. But If I'd share my opinion...

It is true that user must guess the buffer size and that is not so elegant. Here's some reasons I remember.

There is no general way to get the shape inference before run - as some operations output shapes are value dependent, it may change in every run
NN API adopted the same way
Another way(similar to TF Lite interpreter) has also drawback, see below

And the alternative way would be defining an alternative for nnfw_run which keeps its own output buffers internally. After the run users can get the shapes and the pointer with another API after run.

The session must manage output buffer lifetimes
Hard to avoid copies
- Some use case we had was like this. Model outputs are used directly to another model's inputs or even the same model's inputs. In these cases it is likely to have additional copies
- If a user does the inference repeatedly, they always have to call API to get the output pointers while the above way can fix them

Anyways we could switch to this or have both, if we need to.

wateret commented 4 years ago

It is not so bad that API mentions dynamic and static modes, since there is a fundamental difference between them: in the dynamic mode, an output shape can't be determined in advance, i.e. user can't rely on querying the shape from nnfw_output_tensorinfo before calling nnfw_run to calculate the necessary buffer size to pass to 'nnfw_set_output'.

In my personal opinion, I don't agree, due to the reason as I kept saying - I don't think our target users need to know about that stuff(or we may need to make them understand).

So my opinion is removing all about the static/dynamic tensors in the API documentation.

glistening commented 4 years ago

It is not so bad that API mentions dynamic and static modes, since there is a fundamental difference between them: in the dynamic mode, an output shape can't be determined in advance, i.e. user can't rely on querying the shape from nnfw_output_tensorinfo before calling nnfw_run to calculate the necessary buffer size to pass to 'nnfw_set_output'.

In my personal opinion, I don't agree, due to the reason as I kept saying - I don't think our target users need to know about that stuff(or we may need to make them understand).

So my opinion is removing all about the static/dynamic tensors in the API documentation.

@wateret I agree. Then, we may need to remove the mention about dynamic/static in nnfw_input_tensorinfo.

https://github.com/Samsung/ONE/blob/1a528390286267f44ad5852bd6a6dafb863b495b/runtime/onert/api/include/nnfw.h#L230-L247

Then, what about?

Let's have rule 1 as internal rule for a while till we have better solution. (dynamic/static mode will not be explicitly mentioned. Only possible performance penalty will be mentioned.)
Let's remove static/dynamic tensor mention from nnfw_input_tensorinfo() reference the above.
Let's have rule 2 as public rule. (That is, tensorinfo will remember the last tensor info then use it for nnfw_run.)

hyunsik-yoon commented 4 years ago

@krayzemli Could you let us know your opinion?

krayzemli commented 4 years ago

Moving static/dynamic mode details from the API description to a separate technical article would be a good idea. In future, we may want to introduce control options that influence shape inference and memory management, and this technical article would reflect this. However, now any changes in API description without changes in API itself will in vain attract user attention. So I would postpone it till some change/extension of the API.

lemmaa commented 4 years ago

Moving static/dynamic mode details from the API description to a separate technical article would be a good idea. In future, we may want to introduce control options that influence shape inference and memory management, and this technical article would reflect this. However, now any changes in API description without changes in API itself will in vain attract user attention. So I would postpone it till some change/extension of the API.

@krayzemli , I would suggest writing a Design Documents or RFCs for this purpose. It will be a draft of detailed design for implementation, it could be a technical document as it is later, or it could be a user guide with a little touch. Just organizing the discussions so far in this issue would be a good example. :)

Unfortunately we do not yet have an official guide to this. Perhaps through this attempt, we can make a guide together. I have written down some examples of my experiences so far, so please refer to them.

.NET Design Proposals

TensorFlow RFCs

hyunsik-yoon commented 3 years ago

This issue is not active for a month. @krayzemli, if you have any progress with https://github.com/Samsung/ONE/issues/4625#issuecomment-721435900, could you please create another issue for the progress? Thank you.

Samsung / ONE

Make rule for multiple calling of nnfw_set_input_tensorinfo() and nnfw_run() #4625

@krayzemli, @wateret et al. are in discussion in the above

.NET Design Proposals

TensorFlow RFCs