Samsung / ONE

On-device Neural Engine
Other
427 stars 152 forks source link

[onert] Dynamic shape test generation script issue #1315

Open hseok-oh opened 4 years ago

hseok-oh commented 4 years ago

cc/ @hyunsik-yoon @ragmani @wateret

We are using test generation script from android for dynamic shape test. It seems to have some issue, but we need to use this script for fast implementation. But we need to find other way someday, IMO. (using our API, not NNAPI)

Code maintain issue

All codes in tests/nnapi/nnapi_test_generator/ are from android source code. (small fix with comment in code and description in README.md https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/tests/nnapi/nnapi_test_generator/android-10/README.md#fix-for-onert)

But current dynamic shape test generation script tests/nnapi/nnapi_test_generator/android-10/dynamic_tensor.py is new script.

Invalid concept on NNAPI spec

Current dynamic shape test is change input shape on inference time different with input shape described in model. But this is not allowed in NNAPI spec. NNAPI is allow to set shape on each inference time when shape is not specified on model.

Setting input shape on inference time (ANeuralNetworksExecution_setInput)

https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/runtime/nnapi-header/include/NeuralNetworks.h#L6226-L6235

Unless the input is omitted, this should be used to specify the dimensions that were left unspecified when the operand was added to the model. All other properties of the type must be the same as specified in the model.

Unspecified shape (ANeuralNetworks_OperandType)

https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/runtime/nnapi-header/include/NeuralNetworks.h#L5240-L5248

https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/runtime/nnapi-header/include/NeuralNetworks.h#L5268-L5276

hseok-oh commented 4 years ago

NNAPI's dynamic output shape test can be enabled by removing comment out in below generation script

https://github.com/Samsung/ONE/blob/bf58359512ebfd7c1bfd3a705d09241b9b98f0f7/tests/nnapi/nnapi_test_generator/android-10/test_generator.py#L990-L996

It makes dynamic shape test variation for all current generated test automatically, but

hyunsik-yoon commented 4 years ago

Thanks for raising the issue.

Code maintain issue

Correct. Would it be OK if we move dynamic_tensor.py into, e.g., tests/nnapi/spec?

Invalid concept on NNAPI spec

Regarding input of a graph:

Fortunately, current tests does not use input with unknown dim. All shapes are fixed with inputs. https://github.com/Samsung/ONE/pull/1243/files#diff-c9a91d2d588f4a6437c81cb682d9ebc0R1 is an example of Add to see if Add can handle dynamic tensor input. In this graph, the first op is Reshape, whose inputs (also inputs of graph) are all fixed. The output of Reshape is dynamic tensor and that will be an input to Add.

Regarding ANeuralNetworks_OperandType:

The output of Reshape cannot be know at compilation time, so I put [1]. This may cause some issue in future.

https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/runtime/nnapi-header/include/NeuralNetworks.h#L5240-L5248

The above spec mentions "whenever possible" so, I guess it would be OK.

https://github.com/Samsung/ONE/blob/676af94533fb3ef335f55f7f6884fe939b267828/runtime/nnapi-header/include/NeuralNetworks.h#L5268-L5276

But not sure about the above description.

I look into how TensorFlow Lite handles such situation: When the above model is read from tflite file, normally output of Reshape is a scalar (not a correct shape) but it seems that TFLite does not do any special handling:

hyunsik-yoon commented 4 years ago

With #1318, I think all inputs and intermediate tensors are OK with current nnapi spec.

I discussed this with @hseok-oh and @chunseoklee about ways we can select for testing.

  1. test with *.mod.py (current way)
  2. write a test with json file and convert it to tflite
  3. write TF file to build a model and convert it to pbtxt, and the tflite by running TF 1.x (or 2.x)

It seems that 1 is better than 2 or 3. In case of 3, it involves CI overhead.

How about going with the current way ?