KhronosGroup / NNEF-Tools

The NNEF Tools repository contains tools to generate and consume NNEF documents
https://www.khronos.org/nnef
222 stars 57 forks source link

tensor_view should pad shape to the rank of operation result #116

Closed lxgo closed 4 years ago

lxgo commented 4 years ago

Hello

While analyzing the code of the sample NNEF runtime I came to a conclusion that executors that use tensor_view shall fail when a rank of any of its operand is smaller than a rank of the result, (According to the specification, this is a valid case. It is also successfully handled by the current implementation of shape inference. In practice this case shall be encountered, for example, while executing inception v2 - v4 models from the NNEF Model Zoo - I could provide more specific details if necessary,)

I believe that when such a case is encountered, tensor_view operator[] will move the shape pointer of a tensor with the smaller rank beyond the valid shape array space.

Apparently, this issue can be fixed by passing the rank of the result as an additional parameter to the tensor_view constructor and, if it is larger than the actual tensor rank, allocating larger space for the tensor_view.shape field and padding it with 1.

[Please note that I have not tested this behavior life with the original C++ code. I found the issue while porting NNEF-Tools to Go and testing the ported code with all models from the Model Zoo. I have implemented the proposed padding in my Go code and believe that the same issue shall occur in the original C++ version.]

Regards

gyenesvi commented 4 years ago

I think I understand what you are referring to, but I don't think it should cause problems in the code. I believe that the case you describe with lower ranks for operands than for the result can happen in case of ops that perform broadcasting (binary / ternary select). However, the recursive (by rank) code for the actual execution (function binary/select in operations.h) has an optimization branch for the case when the volume of one operand becomes 1 (which includes the case when the rank becomes 0). Within that branch (in the call to _binary/_select), no further indexing happens to the tensor_view of that operand (dx or dy is either 0 or 1, and becomes 0 for the case when the rank is 0).

Or is there any other case when you think the indexing would break?

lxgo commented 4 years ago

Yes, you are right! If during the recursive execution the shape pointer moves behind the end of the shape buffer, then the volume becomes 1, therefore the optimization branch is taken and the shape pointer is not used.

Sorry for the false alarm!