Closed jnorwood closed 5 years ago
Just for clarification, NNEF does specify data-types for tensors, but the data-types are not exactly what they usually are in frameworks. NNEF has tensors of scalar
, integer
and logical
data-type (note that integer
is named coordinate
in the provisional spec, but it's going to be renamed in the final). These data-types specify what the tensor is used for, and not how the data is represented, and you are right, that is a conscious decision. scalar
tensors are used for arithmetic, logical
tensors are for results of comparisons and for branching, and integer
tensors are used for storing indices to other tensors. The final spec will have a more clear description of tensor data-types.
The representation of these tensors is totally up to the inference engine running the network, and it's not (a core) part of the description. For example, scalar tensors can be represented with 8 bit integers with fixed point arithmetic on one target device and with floats on an other target device. But that's a decision of the target device, and not the designer of the network.
However, the designer of the network can give some hints about the precision required to represent those tensors to make the network work, and that's what the quantization file is for. It can tell for each tensor individually, not only how many bits are required to represent it, but also the algorithm used to interpret those bits. For example the same 8 bit integers may be used with linear or logarithmic quantization. The algorithms are described by the corresponding compound operations, see chapter 7 of the spec for details. This is for the activation tensors used during computations.
The quantization in the tensor file has a different purpose: simply to store the data in a compressed format. Of course, it is related to the quantization algorithms used for computation of activations, but they are independent things. We will also emphasize this in the final spec with clear separation of the quantization algorithm for activations and the compression algorithms to store the weights in the binaries, along with their relation.
Let me know if the final spec makes things more clear regarding this issue!
Yes it is clear.
I noticed that tensorflow is providing quantized operation names now, for operating on the integer quantized values, rather than trying to specialize the parameter types of the similar scalar operations. For example, they now have quantized_conv2d. https://www.tensorflow.org/api_docs/python/tf/nn/quantized_conv2d
The nnef conv2d specifies scalar tensor values in the grammar, so would not be appropriate for specifying quantized operations. This feels related, but perhaps is a separate issue.
In NNEF, the fact that something may be implemented using integers is a detail of the implementor engine. It is not indicated by the data-type; it is still a scalar operation. Instead, the quantization file describes how that scalar may be restricted to quantized values. So the name does not need to reflect how the operation is actually implemented.
The tensor datatypes
`
No, matmul/conv cannot be applied to integer tensors. Integer tensor in NNEF does not mean that it's a quantized representation, but instead express the semantics of those values. Integer tensors mean that the values in the tensors are integer offsets to other tensors, such as argmax values of pooling/reduce operations. You cannot perform matrix multiplication on such values. Every value that you can perform arithmetic on is a scalar. Everything that you can use for logical operations is a logical. Everything that you can use for indexing other tensors is an integer tensor.
How the values are represented is up to the implementation. An implementation may represent scalar values as 8 bit integers for example. But that's not reflected in the data-type. Instead it is reflected in the quantization info for that tensor, such as linear_quantize(min = -3.0, max = 3.0, bits = 8)
can be represented as 8 bit integers if an implementation wants to do so.
If you want to make a literal to be a scalar value, yes you need a decimal point, but I don't really see where you need literals for matmul (unless you want to multiply with a constant matrix).
It appears to be a conscious decision that the NNEF spec is putting aside the issue of specifying data types for the variable assignments and parameters; however the tensor file support allows specifying number of bits.
I'm wondering what is the intended implementation to specify different size data variables in the graph descriptions. Do we need to come up with our own variable name mangling, for example myval1_fp16 or myval2_s8, and then post-process these elsewhere on our own?