Closed zoeoz closed 4 years ago
You are right that result
is a copy or reference to value
. Whether it is an actual copy or reference that is an implementation detail, and does not matter from the point of view of a logical description.
I think it does not make sense to have the restriction that the value
must not be a variable. Even if it is a variable, it can be referenced as result
(without copy) since even the original reference to the variable does not change during one execution of the graph.
I agree that the text can be clarified to say that variable
must be a variable and that result
must have the same contents as value
.
The reason we assumed value
must not be a variable is because of this statement in the spec:
The new value can be referenced via
result
. However,result
is not a variable tensor, so it cannot be updated again. Thus, each variable can only be updated once in a cycle.
This seems to require that result
is actually not the same tensor as value
, at least in the case when value
is a variable. For example, if value
is a variable, it would imply that result
must be a copy of value
, meaning implementers cannot return a reference to value
in this situation.
The other reason is that if value
may be a variable, couldn't this cause ambiguity or unexpected results in the variable updates? For example, if one variable can be updated by another variable, what happens if this update chain includes a cycle? If variables can only be updated by non-variables, it seemed (unless we miss something) that such cycles will never be possible.
Even if result
is not the same tensor as value
, it can reference the same underlying buffer, so copying can be avoided. But all this is an implementation issue, the high level description of NNEF can be mapped to the implementation in many ways. It may even be that the implementation does not actually differentiate between variables and non-variable tensors when executing a network. Implementation-wise, all tensors could change their value, but some just don't. That's also a valid choice, since once the compiler checks that the input model is well-formed, the low level code does not need to express the high level constraints.
I believe that circular updates are avoided exactly by the fact that during the execution of a single iteration, the variable values are not changed, they all refer to their values at the beginning of the cycle. I believe in terms of dependencies, updating one variable with another directly is not different from using one variable to calculate and update tensor, and then updating another variable with that (so you are indirectly using the first variable to update the second, which is exactly what happens in any recurrent network that has updates in it).
I believe that circular updates are avoided exactly by the fact that during the execution of a single iteration, the variable values are not changed, they all refer to their values at the beginning of the cycle. I believe in terms of dependencies, updating one variable with another directly is not different from using one variable to calculate and update tensor, and then updating another variable with that (so you are indirectly using the first variable to update the second, which is exactly what happens in any recurrent network that has updates in it).
Just to be clear, I was referring to "cycles" in the sense of the directed acyclic graph that is formed by examining the dependencies between variables that are updated at the end of each iteration. For example, if there are two variables a
and b
and two update
operations such that
a
= b
b
= a
then this is a "cycle" in the update dependency graph because the relationship between a
and b
in this sense is not a DAG. Therefore, after the current iteration completes, if a
is updated first, then both variables will end up with the contents of b
. If b
is updated first (which seems plausible since NNEF doesn't define the order of updates), then both variables will end up with the contents of a
. So NNEF has defined ambiguous behavior.
This is why we had assumed the spec is trying to imply that variables cannot be updated by other variables, since this appears to be a simple way to eliminate the ambiguity. For example, it would instead require two update operations such that
a
= t
b
= t
where t
is a non-variable tensor. Now at the end of the iteration, regardless what order the variables are updated, both variables a
and b
will contain the value associated with t
and it seems there is no longer any possibility of ambiguous behavior.
I suppose the other alternative might be to allow variables to be updated by other variables as you have mentioned but to then define the order updates must be processed at the end of each iteration so that ambiguous situations cannot arise.
I believe what you are missing is that the updates are actually understood this way:
a_new = b_old
b_new = a_old
Then there is no cycle in the updates (you can only read out the old value of a variable), and the order does not matter.
You are correct. I was missing that. If you can somehow make it as crystal clear in the spec as you just did in this comment, that will surely leave no doubt for implementers. For all the time and effort I've put into understanding the update
operation, this was not clear to me until now. Thanks.
Okay, I will clarify that.
Fixed in the latest spec revision.
NNEF spec in section 4.8:
Emphasis on new value is mine. Three key observations here. First, the context seems to imply that the "new value" returned via
result
is a copy of (or reference to)value
, although this is not explicitly defined. Second, it is defined thatresult
must not be a variable, but the spec doesn't say anything aboutvalue
. Third, Argument Validity only requires the rank and shape ofvalue
andvariable
to be equal.It seems clear that
variable
must be a variable. Although this may be a "captain obvious" moment, it should be added to Argument Validity so that trying to update a tensor that was not the result of avariable
operation is an error. Additionally, it would seem thatvalue
must not be a variable and this should also be added to Argument Validity. For example, updating a variable from another variable, we are assuming based on the spec, should be an error.Back to the original observations, these two additions to Argument Validity mean that in practice, for example, an implementer can simply return a reference (not a copy) of
value
inresult
. This is what we currently do in our NNEF compiler. But to be correct it assumes the two Argument Validity conditions mentioned above.Of course, if we've made the wrong assumptions we can change our implementation. Either way, the point is some clarity is needed here.