Closed Spiess closed 5 years ago
@Spiess This is intentional. In the first case, you create tensors a
and b
and then define ys
as Seq(a - b)
and take it's gradients with respect to a
and b
. In the second case, you define nxs
as Seq(c.value, d.value)
(explicitly showing the implicit conversion) and nys
as Seq(c.value - d.value)
.c.value
call creates a symbolic op in the graph and returns its output tensor. This tensor is different in val nxs: Seq[Output[Float]] = Seq(c, d)
and in val nys: Seq[Output[Float]] = Seq(c - d)
because two different ops are created. Thus, the nys
in this case do not depend on the nxs
and their gradients are null
. I believe the implicit conversion from variable to tensor is making it hard to notice this distinction.
Ah right, thanks for clarifying. I assumed that variable value ops still depend on their original variables in the graph, but if that's not the case this output makes sense.
The value ops do depend on the underlying variable but when you take the gradient with respect to the output of the value of, it will only be non-null for tensors that depend on the output of that value op instead of the underlying variable. It is a bit confusing, but I did it this way so that it’s also consistent with the Python API.
I’ll close this issue, but please reopen if a problem persists.
When using
Gradients.gradients
resulting gradients arenull
when the Variables are converted to Outputs separately forys
andxs
.Example:
Here
println(gradientsAB)
outputs proper gradientsList(Output[...], Output[...])
whileprintln(gradientsCD)
outputsList(null, null)
.