KhronosGroup / NNEF-Docs

NNEF public repository
Apache License 2.0
14 stars 3 forks source link

Variable update and the "new value" #19

Closed zoeoz closed 4 years ago

zoeoz commented 5 years ago

NNEF spec in section 4.8:

After executing all operations in the current graph cycle, the content of variable is replaced with the content of value. The variable itself always refers to the original value at the beginning of the cycle. The new value can be referenced via result. However, result is not a variable tensor, so it cannot be updated again. Thus, each variable can only be updated once in a cycle.

Emphasis on new value is mine. Three key observations here. First, the context seems to imply that the "new value" returned via result is a copy of (or reference to) value, although this is not explicitly defined. Second, it is defined that result must not be a variable, but the spec doesn't say anything about value. Third, Argument Validity only requires the rank and shape of value and variable to be equal.

It seems clear that variable must be a variable. Although this may be a "captain obvious" moment, it should be added to Argument Validity so that trying to update a tensor that was not the result of a variable operation is an error. Additionally, it would seem that value must not be a variable and this should also be added to Argument Validity. For example, updating a variable from another variable, we are assuming based on the spec, should be an error.

Back to the original observations, these two additions to Argument Validity mean that in practice, for example, an implementer can simply return a reference (not a copy) of value in result. This is what we currently do in our NNEF compiler. But to be correct it assumes the two Argument Validity conditions mentioned above.

Of course, if we've made the wrong assumptions we can change our implementation. Either way, the point is some clarity is needed here.

gyenesvi commented 5 years ago

You are right that result is a copy or reference to value. Whether it is an actual copy or reference that is an implementation detail, and does not matter from the point of view of a logical description.

I think it does not make sense to have the restriction that the value must not be a variable. Even if it is a variable, it can be referenced as result (without copy) since even the original reference to the variable does not change during one execution of the graph.

I agree that the text can be clarified to say that variable must be a variable and that result must have the same contents as value.

zoeoz commented 5 years ago

The reason we assumed value must not be a variable is because of this statement in the spec:

The new value can be referenced via result. However, result is not a variable tensor, so it cannot be updated again. Thus, each variable can only be updated once in a cycle.

This seems to require that result is actually not the same tensor as value, at least in the case when value is a variable. For example, if value is a variable, it would imply that result must be a copy of value, meaning implementers cannot return a reference to value in this situation.

The other reason is that if value may be a variable, couldn't this cause ambiguity or unexpected results in the variable updates? For example, if one variable can be updated by another variable, what happens if this update chain includes a cycle? If variables can only be updated by non-variables, it seemed (unless we miss something) that such cycles will never be possible.

gyenesvi commented 5 years ago

Even if result is not the same tensor as value, it can reference the same underlying buffer, so copying can be avoided. But all this is an implementation issue, the high level description of NNEF can be mapped to the implementation in many ways. It may even be that the implementation does not actually differentiate between variables and non-variable tensors when executing a network. Implementation-wise, all tensors could change their value, but some just don't. That's also a valid choice, since once the compiler checks that the input model is well-formed, the low level code does not need to express the high level constraints.

I believe that circular updates are avoided exactly by the fact that during the execution of a single iteration, the variable values are not changed, they all refer to their values at the beginning of the cycle. I believe in terms of dependencies, updating one variable with another directly is not different from using one variable to calculate and update tensor, and then updating another variable with that (so you are indirectly using the first variable to update the second, which is exactly what happens in any recurrent network that has updates in it).

zoeoz commented 5 years ago

I believe that circular updates are avoided exactly by the fact that during the execution of a single iteration, the variable values are not changed, they all refer to their values at the beginning of the cycle. I believe in terms of dependencies, updating one variable with another directly is not different from using one variable to calculate and update tensor, and then updating another variable with that (so you are indirectly using the first variable to update the second, which is exactly what happens in any recurrent network that has updates in it).

Just to be clear, I was referring to "cycles" in the sense of the directed acyclic graph that is formed by examining the dependencies between variables that are updated at the end of each iteration. For example, if there are two variables a and b and two update operations such that

a = b b = a

then this is a "cycle" in the update dependency graph because the relationship between a and b in this sense is not a DAG. Therefore, after the current iteration completes, if a is updated first, then both variables will end up with the contents of b. If b is updated first (which seems plausible since NNEF doesn't define the order of updates), then both variables will end up with the contents of a. So NNEF has defined ambiguous behavior.

This is why we had assumed the spec is trying to imply that variables cannot be updated by other variables, since this appears to be a simple way to eliminate the ambiguity. For example, it would instead require two update operations such that

a = t b = t

where t is a non-variable tensor. Now at the end of the iteration, regardless what order the variables are updated, both variables a and b will contain the value associated with t and it seems there is no longer any possibility of ambiguous behavior.

I suppose the other alternative might be to allow variables to be updated by other variables as you have mentioned but to then define the order updates must be processed at the end of each iteration so that ambiguous situations cannot arise.

gyenesvi commented 5 years ago

I believe what you are missing is that the updates are actually understood this way:

a_new = b_old b_new = a_old

Then there is no cycle in the updates (you can only read out the old value of a variable), and the order does not matter.

zoeoz commented 5 years ago

You are correct. I was missing that. If you can somehow make it as crystal clear in the spec as you just did in this comment, that will surely leave no doubt for implementers. For all the time and effort I've put into understanding the update operation, this was not clear to me until now. Thanks.

gyenesvi commented 5 years ago

Okay, I will clarify that.

gyenesvi commented 4 years ago

Fixed in the latest spec revision.