KhronosGroup / NNEF-Docs

NNEF public repository
Apache License 2.0
14 stars 3 forks source link

Misc. tensor file format clarifications/suggestions #7

Closed zoeoz closed 4 years ago

zoeoz commented 6 years ago

Clarification: for the two-byte version info, specify that the first byte is the major version and the second byte is the minor version (right now, this is only implied).

Suggestion: define the algorithm codes for the Khronos vendor as the following:

Rationale: Conceptually, signed and unsigned uncompressed integer values and logarithmically unsigned and signed quantized values are all different algorithms (formulas). Uncompressed unsigned and signed integer algorithms (codes 0x10 and 0x11, respectively) require no quantization parameters. Logarithmically unsigned and signed quantized value algorithms (codes 0x30 and 0x31, respectively) only require one parameter, i.e., the "max" parameter as defined by the logarithmic_quantize operation. There are plenty of bits to spare in the algorithm code field, so defining six algorithms instead of four algorithms poses no trouble. There is no need to use the unnecessary and/or extra quantization parameter fields to make these distinctions. Note: the least significant bit of the suggested algorithm codes by convention indicate if the values stored in the tensor data (i.e., the input to the specified algorithm) are unsigned (0) or signed (1).

gyenesvi commented 6 years ago

Your rationale for grouping the codes makes sense, but we had a bit of a different way of thinking. We wanted to keep the different codes minimal to define broader groups, and express further differences with the parameters. This way, the min and max parameters corresponding to the linear and logarithmic quantization codes are identical, and easily determine whether it's signed or not for logarithmic. Using an extra parameter for the integer case is not an issue since it has 32 bytes of unused parameters anyway, which will probably not be used in the future either.

But actually, I don't see any significant differences between the two ways of encoding the same information.

zoeoz commented 6 years ago

To be sure, both approaches encode all the required information.

In my view, the proposed format is slightly more complex for a reader, since the algorithm code for unsigned and signed values is essentially spread across multiple fields (the algorithm code field and the sign flag parameter field).

Although the linear and logarithmic quantized codes both have min and max fields, it feels a bit awkward to me since the logarithmic_quantize operation only has a "max" parameter and no "min" parameter. Indeed, the "min" parameter in this case is essentially just a sign flag, as in the integer case. So it seems a bit of a mash-up of ideas, although like I mention, perfectly functional.

I'm a visual thinker, so I've created two figures to show how I visualize it in my mind. I'll leave it here for folks to contemplate. Note, I swapped the order of the min and max fields in the linear quantized case of the second figure to show that the "max" field can still be identical between the linear and quantized cases, if desired.

tensor files 1

tensor files 2

zoeoz commented 6 years ago

Regarding the comment to use codes to organize into broader groups, I'd like to elaborate that this can also be done by designating higher-order bits in the algorithm code to represent the broad groups. Lower order bits can designate sub-groups, and the lowest order bits the type of tensor data stored in the file (and it's signedness)

For example: say bit 0 is the signedness (0 = unsigned, 1 = signed), bit 1 is data type (0 = float, 1 = integer), so the lowest 2 bits are always:

Then take a pick of a higher order bit to designate compression, say bit 7 (0 = no compression, 1 = compression), and bits 4-6 the sub-group of compression (0 = linear, 1 = logarithmic, the rest reserved). In this case, we have things organized neatly:

This gives a quick decode from a single field for the reader, and only parameters to the corresponding operation node ("min" and "max" for code 0x82, and "min" for codes 0x92 and 0x93), which cannot be stored in the algorithm code, are needed to be fetched out of the parameter area.

This kind of arrangement can help to ensure plenty of extra bits are reserved in the algorithm code for future expansion (there are a total of 16 bits to work with, so plenty of room), while keeping things neatly organized into a hierarchy of groups. Admittedly, the codes suggested in my first post were a little more ad-hoc.

gyenesvi commented 6 years ago

Thanks for detailing the approach, it makes sense. We'll have to discuss if this adds enough value to the spec to change it in a future version.

zoeoz commented 6 years ago

One last thought: I would actually reserve some of the low-order bits as in the example above for future expansion of data types.

For example, we are experimenting with storing interval bounds on the learned parameters. The interval bounds can be floating-point, signed or unsigned integer, compressed or uncompressed, etc., just as the case for non-interval types. So for example, if bit three was defined as the interval data indicator (0 = non-interval, 1 = interval), then in the three lowest-order bits we have:

And then additional enumeration codes as in the previous post for the compressed formats, etc.

We are not proposing the new interval type at this time... that is premature. It is just to illustrate our thoughts about future expansion in backwards-compatible manner, etc.

Note that if the number of basic datatypes grow in this manner, the permutations of different algorithms potentially grow very fast, as well. And soon it may become more complicated for readers if the decoding has to come from multiple fields instead of a single enumeration field.

gyenesvi commented 6 years ago

It is not clear to me what you mean by interval bounds, can you give an example? Do you refer to quantization ranges?

zoeoz commented 5 years ago

Yes. Mathematically, an interval [a,b] is the set of real numbers { x | a <= x <= b }, where a and b are real numbers. In practice, we use intervals [u,v], where u and v are some computer-representable values of finite range and precision that satisfy [a,b] \subset [u,v], i.e., u <= a and b <= v.

If u and v are IEEE 754 floating-point values, for example, it will be

[u,v] = [roundDown(a),roundUp(b)],

where roundDown and roundUp are functions that round the exact value of a real number x to a machine-representable number that satisfies the property roundDown(x) <= x <= roundUp(x).

Or it may be that u and v may be signed or unsigned integers, or linear or logarithmatically quantized values. Whatever the case may be, u and v share the same representation. So, they will both be floating-point values, signed integers, unsigned integers, linearly quantized values, logarithmically quantized values, etc.

gyenesvi commented 5 years ago

From your explanation, it appears to me that these interval are somewhat independent of quantization (so it's not the same as the quantization interval [min,max]), they can be used even if the data is not quantized, like in case of floats. But what is the benefit or use of such bounds? Even if the bounds are not explicitly mentioned, the data could come from an interval. How does the consumer use the interval information?

zoeoz commented 5 years ago

I think I misunderstood what you meant by "quantization ranges." However, it sounds like you still get the idea. We are using a new artificial neural network training method that produces and consumes interval bounds on the learning parameters. The how and why would take this thread off topic, which I didn't mean to do. If you are interested, I'd be glad to discuss offline. I only meant to present a hypothetical example of why it could be important to reserve some of the low-order bits in the algorithm code for future expansion, and to illustrate how quickly the permutations of algorithm codes can potentially multiply in the future (providing a motivation for simplifying readers, so they are able to read the algorithm code from a single field as opposed to potentially multiple fields, as much as possible).

gyenesvi commented 5 years ago

Sure, I understand the concept, just wanted to see if this is something fundamentally new, since quantization approaches also produce interval bounds on the learning parameters. What is not clear to me is how those interval bounds can interact with quantization.

It's not important to describe the how and why, but maybe only the end result: what is the data to be stored in the end, and how does it differ from the currently available quantized/unquantized data formats? And I think that part can be important for this thread, it would not mean taking it off topic.

zoeoz commented 5 years ago

Section 1.1.4 of the specification says:

"[NNEF] does not define an execution model for neural networks, and hence it does not define what it means to correctly execute a neural network described in NNEF. Although it does define the semantics of operations supposing infinite arithmetics, defining correct execution of actual implementations would require finite arithmetics and underlying representations to be taken into account, which is out of the scope of NNEF."

In my view, this is exactly the right approach. I was pleased to see this separation of concerns between the mathematical definition of the graph structure and the underlying computer implementation of both the training method and the finite arithmetics.

To help aid in discussion, let's define Level 1 as the mathematical definitions and Level 2 as the underlying computer implementations. So at Level 1, everything is specified exactly (with infinite arithmetics), while at Level 2 everything is a finite approximation (using finite arithmetics).

At Level 1, a learning parameter x is a real number, which has infinite precision. Likewise, a Level 1 interval [a,b] is a closed set of real numbers { x | a <= x <= b }. For input, an interval may represent the domain of a learning parameter, such that the learning parameter's contribution to the training solution is expected to be some value contained in the interval. In the worst case, such an input interval may be [-oo,+oo], representing the entire real number line (note that an interval is a set containing only real numbers, so infinity is never a member of this set; and, for infinite endpoint(s) the interval is still closed, but it is also unbounded). In practice, it may be proved that a more narrow interval, such as [-1,1] may specify the domain of a learning parameter. This is not unlike the linear_quantize operation, for example, where at Level 1 the input domain of real values x is essentially specified by the interval [min,max].

For output, the interval may be narrowed much further, in order to isolate the specific subdomain of the learning parameter that actually contributes to the training solution, to within some degree of user-specified tolerance, e.g., [0.00231,0.00232] narrows the domain to within a tiny fraction of relative error, by some hypothetical training method that is able to provide this answer.

Thus, a Level 1 interval [a,b] is mapped to a Level 2 interval [u,v] as described in my September 10 post, where a and b are exact (infinite precision), and u and v are finite machine values of the same kind, such that interval [a,b] is a subset of interval [u,v], i.e., all the exact (infinite precision) elements of [a,b] are elements of the interval [u,v], even though the opposite may not be true (for example, there may be elements of the interval [u,v] that are not elements of [a,b], for reasons illustrated in the Sept. 10 post).

So the end result is that regardless if the interval is input or output, each interval is defined at Level 2 to be a pair of finite machine values of the same kind, where by the same kind it means that both u and v are IEEE 754 floating-point values, signed integers, unsigned integers, linear quantized values, logarithmically quantized values, etc.

gyenesvi commented 5 years ago

So to summarize, in the end, do I understand clearly that you would like to store 2 values (the interval) for each weight instead of just 1 value as for regular network weights?

zoeoz commented 5 years ago

Yes. Each tensor element is an interval (2 values) instead of a point (1 value).

gyenesvi commented 4 years ago

The spec has been updated to version 1.0.3, in which quantization parameters have been deprecated in the binary file. Now, only an item type code is stored, which indicates whether it's signed or unsigned when applicable, and further quantization algorithm info can be read out from the accompanying textual quantization file, which is more flexible for storing custom quantization methods. See the updated spec for more details.

Let me know if I can close this issue.