Data vs. Operation - Githubissues

zoeoz commented 6 years ago

I’d like to draw attention to a few subtle inconsistencies about the way the document currently describes data and operations.

Section 2.1 begins with a very concise definition that a computational graph consists of “data nodes” and “operation nodes.” A corresponding figure depicts external (E), constant (C), variable (V), and tensor (T) data nodes (squares) connected to operation nodes (ellipses) and vice-versa. All of this made complete sense to me, and I thought it was well written.

Section 2.3 briefly introduces the term “computational node,” which doesn't appear to be used again. I believe this is the same as an “operation node,” since by definition a “data node” doesn’t perform an operation (or computation). Not sure what the intent was here. It seems to be a distinction without a difference.

In Section 2.5 nodes are described as “operations” and “tensors,” not as “operations” and “data.” It seems the document has made a subtle shift in terminology here. I wouldn’t necessarily conflate “tensors” with “data,” since Section 2.1 defines different kinds of data, namely external, constant, variable, and tensor (essentially, “temporary” data). I think part of the issue is that “tensor” is defined as a “data node” in Section 2.1, while in Section 2.5 it seems to be referring to data type.

Section 4 briefly uses the term “computational primitive,” analogous to the curious use of “computational node” in Section 2.3. Section 4 also uses the term “graph primitive” to describe the same thing as a “computational primitive.” I think “graph primitive” seems more appropriate to me, since by previous definitions, only operation nodes can be primitives.

Section 4.1 introduces a clash of definitions. Section 2.1 defines an “input data node” of the external, constant, and variable variety. Now Section 4.1 describes external, constant, and variable as an “operation.” I think the document should choose one definition or the other, i. e., are external, constant, and variable data nodes or operation nodes? They shouldn’t be identified as both.

Conceptually, if I was going to do a class hierarchy based off definitions from Section 2.1, it would look like this: Data and Operation are derived from Node. External, Constant, Variable, and Temporary are derived from Data. Primitive and Compound are derived from Operation. The operations in Sections 4.2-4.7 are derived from Primitive, and the operations in Section 4.8 are derived from Compound.

gyenesvi commented 6 years ago

Thanks for the review and suggestions for improvements. Indeed, there might be a mix of terminologies, as the document evolved over time some naming also changed and it could be beneficial to make it more unambiguous. For example all data is in the form of tensors, it may be better to use the word 'tensor' everywhere instead of data nodes (except where tensors are introduced, it must be specified that it represents the data in the graph).

However, I believe the 'class hierarchy' that you arrive at does not cover the picture completely, I must clarify this point. external, constant and variable are special operations, whose purpose is to introduce data (tensors) that is not the result of operations that perform actual computation (not all operations perform actual computation, some just rearrange the data for example reshape, transpose, split, concat). So there is a one-to-one mapping between external, constant and variable tensors and the operations that produce (introduce) them as a result. It is required to have these as operations because the graph definition syntax consists of a sequence of operations only, and the tensors are defined implicitly as inputs and outputs to operations. In essence, they are analogous to constructors in object oriented languages. Externals, constants and variables are the same kind of data as 'temporaries'; all of them are tensors. The only specialties are that externals come from outside the graph, variables preserve their state between executions of the graph and can change over time, and constants always have the same contents between subsequent executions of the graph. Some implementations may differentiate them for optimization possibilities, while others may treat them equally, just as regular tensors.

Let me know if this clarifies things, we will try to make the wording better to avoid confusion!

zoeoz commented 6 years ago

Yes, that is helpful. I mentioned the document should choose one definition or the other, because the view of external, constant, and variable as operations, like you mention above, also makes sense. In that case, the language in Section 2.1 is what may need to be reconsidered (not the other way around). I think the fundamental design is solid. Providing a consistent definition and use of terminology throughout the entire document will no doubt be very beneficial.

gyenesvi commented 6 years ago

The new spec is out, with improved wording. Let me know what you think!

zoeoz commented 6 years ago

The language regarding operation nodes and compound vs. primitive operations in Chapter 3 and after is much improved and seems to be consistent.

I found myself at section 2.5 a little thrown off again, because this is where the document stops using the term “data node” and starts using the term “tensor” as a node type, e.g., the definition of computational graph introduces “nodes that are either operations or tensors.”

I did notice on a careful re-read a sentence in section 2.1 that says “data nodes represent multi-dimensional arrays (such as vectors, matrices, or arrays of higher dimension), called tensors.” So there is a link between “data node” and “tensor” as a node here. However, I think my understanding comes best from your Feb 16 forum response (above) to my original question. That really made the relationship and design intentions of the specification crystal clear.

Here are some notes I had about the rest of the new document:

p. 13 definition implies tensor< tensor > and tensor< string > are valid syntax. In subsequent text about semantics, I don’t remember seeing if these are valid or not. I could possibly imaging a tensor of strings, but I have hard time understanding what tensor< tensor > would be.

p. 17 under Compound Types, Tensor Types: “Tensors represent data on which run-time computations is preformed.” Should be “performed?”

p. 19, third bullet under declarations: “Fragment results must be either all tensors or all attributes to clearly separate run-time operations from compile-type attribute computations.” Should be “compile-time attribute computations?”

p. 23, second bullet under Arrays: “If the beginning is less than or equal to the end, the result is the empty array.” I think that should be “If the beginning is greater than or equal...”

p. 31, Argument Validity: Semantics and use of batch and channel and the broadcasting not clear to me. Seems the specification is assuming I should know this. Probably a Python thing? I am a C++ programmer, so not familiar with it.

zoeoz commented 6 years ago

BTW, I saw this morning on p. 18 that it says “tensor types are built from a single data-type, which must be a primitive type,” and above that primitive types are defined integer, logical, scalar, and string. So this means my comment above about tensor-type-spec on p. 13 is addressed, i.e., tensor< tensor > is valid syntax but semantically invalid (as I would expect).

This could be made invalid syntax by adding the new grammar production:

primitive-type-name ::= “integer” | “scalar” | “logical” | “string” | “?”

and then modifying the following existing definitions:

type-name ::= “tensor” | primitive-type-name tensor-type-spec ::= “tensor” “<” [primitive-type-name] “>”

zoeoz commented 6 years ago

Actually, it seems “tensor” should be removed from type-name altogether, as it doesn’t seem that is semantically valid, either. In other words, “tensor” must always appear as part of tensor-type-spec. For example, even an unbound tensor must appear as “tensor< >” as opposed to just “tensor”.

gyenesvi commented 6 years ago

Thanks for your review and report of errors in the spec.

Indeed, data-node and tensors are interchangeable in the spec. It's just that in the intro, I use data-node to give an easy to understand overview, and than later specify that all data-nodes are actually tensors, and from that point on, use the term tensor (hopefully consistently).

You are right about the type name "tensor"; it should have been removed from the definition of <type-name>, it is only valid to use it as part of the <tensor-type-spec>, so tensor<tensor> or just tensor was meant to be syntactically invalid. tensor<string> is syntactically and semantically valid, I believe TensorFlow actually even implements it. There was no point in excluding it from NNEF, the type system is more regular this way.

About broadcasting and batch, channel dimensions, they are more like a deep learning framework/tensor thing, and they are defined maybe a bit late in the document for binary elementwise and convolution operations (sections 4.2.2 and 4.3.1).

You are right about the rest of the typos, we'll correct those.

zoeoz commented 6 years ago

Sounds good. I’m going to close the issue.

KhronosGroup / NNEF-Docs

Data vs. Operation #1