KhronosGroup / NNEF-Tools

The NNEF Tools repository contains tools to generate and consume NNEF documents
https://www.khronos.org/nnef
222 stars 57 forks source link

nnef.load_graph tensor.attr[''compression"] is not complete. #100

Closed joe8086 closed 5 years ago

joe8086 commented 5 years ago

Hello friends,

When I use python interface to load nnef, I found tensor data type attribute not show correct.

import nnef
graph = nnef.load_graph(r'D:\nnef\model\inception_v1.caffe2.nnef')
graph.tensors['variable_10']

result will show:

'variable_1': Tensor(name='variable_1', dtype='scalar', shape=[1, 64], data=array([[-2.5272307 , -0.7387814 , -1.4477274 ,  0.7237619 ,  1.7909977 ,
          0.088202  , -1.7899446 , -2.435894  ,  2.62592   , -1.4647362 ,
          ...,
         -2.5942647 , -5.33587   , -5.118693  , -7.9810386 ]],
       dtype=float32), compression={'op-code': 0}, quantization={}),

From tensor implement comment we can see this dict should have these keys : "op-code" (integer), "bits-per-item" (integer), "min" (scalar), "max" (scalar), "signed" (logical) But we only see op-code, both in float tensor and quantize tensor.

Is this a bug in nnef-tools or it have default value?

gyenesvi commented 5 years ago

First of all, is the problem about the data-type (as you write above) or the compression (as the title of the issue suggests)? Furthermore, is the problem about the C++ API or the Python API? You say you use the Python API, but the documentation you quote is for the C++ API (the two may differ because of the language capabilities).

In this example, the (NNEF) data-type is dtype='scalar', and the corresponding numpy array has dtype=float32, which seems correct. That tells you that the bits per item is 32, and the op-code 0 also matches the float data-type. The min and max is only used when the data is stored in quantized integer values, not floats. Can you show how it looks when you try to load a quantized tensor? Note that it does not matter if the tensor has quantization info, it can still be stored as floats, and then there would be no compression info. Compression info is only stored when the data is actually stored as quantized ints. In order to get that, you need to convert an actually quantized model, such as TFLite.

joe8086 commented 5 years ago

Hi gyenevi, Thanks. My concern is focus on how to identify a tensor's data type. Like the spec said NNEF support float16, float32, float64. Is this data type should query from graph.tensors['tensor_name'].data.dtype? When I read C++ code I think they should reflect to python code, and the bits-per-item should also can query from this path graph.tensors['tensor_name'].compression['bits-per-item']

I did some test, but I'm not sure use this path graph.tensors['tensor_name'].data.dtype is correct path.

normal_graph.tensors['variable'].data.dtype
#dtype('float32')
quantize_graph.tensors['variable'].data.dtype
#dtype('int32')
normal_graph.tensors['variable'].compression
#{'op-code': 0}
quantize_graph.tensors['variable'].compression
#{'op-code': 1}
quantize_graph.tensors['variable_1'].data.dtype
#dtype('uint8')
quantize_graph.tensors['variable'].quantization
#{'bits': 32,
# 'max': 0.0,
# 'min': 0.0,
# 'op-name': 'tflite_quantize',
# 'scale': 0.0001900226779980585,
# 'zero_point': 0}
quantize_graph.tensors['variable_1'].compression
#{'op-code': 1}
quantize_graph.tensors['variable_1'].quantization
#{'bits': 8,
# 'max': 2.7828609943389893,
# 'min': -3.3951563835144043,
# 'op-name': 'tflite_quantize',
# 'scale': 0.024322902783751488,
# 'zero_point': 141}
gyenesvi commented 5 years ago

The Python interface is somewhat different from the C++ one because Python has numpy arrays which carry a lot of information about tensor data (such as data-type, from which the number of bits follows). To improve similarity, I have added the 'bits-per-item' entry to the 'compression' dictionary in the Python interface as well.

However, it is still not clear to me what is the problem with the above examples, they seem okay to me. graph.tensors['variable'].data.dtype will return the dtype of the numpy array that stores the data (such as np.float32, np.int32, np.uint8, etc), as loaded from the NNEF file. That is not the same as the graph.tensors['variable'].dtype, which is the NNEF data type and would be 'scalar' in all cases, irrespective of whether the data is quantized or not.

Furthermore, there is a difference between 'compression' and 'quantization' info. The quantization info describes how the data is used during computation (inference). The compression describes how it is stored in the file. The two can be different. In case of TFLite export, the quantization info describes the relevant info for you. The compression info merely says that the data is stored as plain integers (op-code 1 means integer values without min/max range). You can use the quantization info to know how those integers could be interpreted, how they map to real values.

Let me know if this makes things more clear.

joe8086 commented 5 years ago

Hi gyenesvi,

Thanks for your answer. I list my understand of different data type condition below, please help to check dose these ruler correct or not.

compression == 0x0 float data type, the bits-per-items we can query from graph.tensors['variable'].data.dtype

compression == 0x1 integer data type, the bits-per-items we can query from graph.tensors['variable'].data.dtype when compdression == 1, it may quantization data, so the quantization detail contain in graph.tensors['variable'].quantization. At this kind situation, the quantization ruler is defined by graph.tensors['variable'].quantization['op-name'] nnef support any kind quantization ruler at this branch, but so far the converter only support 'tflite_quantize' as a user define quantization ruler.

compression == 0x10 the data is in quantization data with linear_quantize ruler.

compression == 0x11 the data is in quantization data with linear_quantize ruler.

gyenesvi commented 5 years ago

As I said in my last note, you can now also access bits per item info as graph.tensors['variable'].compression['bits-per-item'] also from Python. Also, as I said, the compression and the quantization are independent. You can have any combination of quantization and compression, so you can have quantization detail even in case of float data.

Compression is only for tensors that are saved as part of the model (variables). Quantization may also be available for activations, not just for variables. And yes, you can access it as graph.tensors['tensor-name'].quantization['opname'].

In case of conversion from TFLite, compression = 0x01 will be used, so that the integer data is identical to the one stored in TFLite files. Furthermore, we set the quantization info so that it reflects the quantization parameters in the original TFLite model.

We understand that having both compression and quantization, which are somewhat related but differet can be confusing, and we are thinking of deprecating the compression parameters and only keeping the compression code (0x00 and 0x01) and quantization info.

joe8086 commented 5 years ago

Thanks gyenesvi,

That give me a more clear understand on the detail.

Thanks. 8086