Gradients are null when deriving by Variables separately converted to Outputs

eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language

http://platanios.org/tensorflow_scala/

Apache License 2.0

936 stars 96 forks source link

Gradients are null when deriving by Variables separately converted to Outputs #150

Closed Spiess closed 5 years ago

Spiess commented 5 years ago

When using Gradients.gradients resulting gradients are null when the Variables are converted to Outputs separately for ys and xs.

Example:

val aVar = tf.variable[Float]("a", Shape(20, 3))
val bVar = tf.variable[Float]("b", Shape(20, 3))

val a: Output[Float] = aVar
val b: Output[Float] = bVar

val xs: Seq[Output[Float]] = Seq(a, b)
val ys: Seq[Output[Float]] = Seq(a - b)

val gradientsAB = Gradients.gradients(ys, xs, Float)

val c = tf.variable[Float]("c", Shape(20, 3))
val d = tf.variable[Float]("d", Shape(20, 3))

val nxs: Seq[Output[Float]] = Seq(c, d)
val nys: Seq[Output[Float]] = Seq(c - d)

val gradientsCD = Gradients.gradients(nys, nxs, Float)

println(gradientsAB)
println(gradientsCD)

Here println(gradientsAB) outputs proper gradients List(Output[...], Output[...]) while println(gradientsCD) outputs List(null, null).

eaplatanios commented 5 years ago

@Spiess This is intentional. In the first case, you create tensors a and b and then define ys as Seq(a - b) and take it's gradients with respect to a and b. In the second case, you define nxs as Seq(c.value, d.value) (explicitly showing the implicit conversion) and nys as Seq(c.value - d.value).c.value call creates a symbolic op in the graph and returns its output tensor. This tensor is different in val nxs: Seq[Output[Float]] = Seq(c, d) and in val nys: Seq[Output[Float]] = Seq(c - d) because two different ops are created. Thus, the nys in this case do not depend on the nxs and their gradients are null. I believe the implicit conversion from variable to tensor is making it hard to notice this distinction.

Spiess commented 5 years ago

Ah right, thanks for clarifying. I assumed that variable value ops still depend on their original variables in the graph, but if that's not the case this output makes sense.

eaplatanios commented 5 years ago

The value ops do depend on the underlying variable but when you take the gradient with respect to the output of the value of, it will only be non-null for tensors that depend on the output of that value op instead of the underlying variable. It is a bit confusing, but I did it this way so that it’s also consistent with the Python API.

I’ll close this issue, but please reopen if a problem persists.