eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
937 stars 95 forks source link

Speed-up graph building significantly #47

Closed carlo-veezoo closed 6 years ago

carlo-veezoo commented 6 years ago

I changed some defs in Op to (lazy) val. These simple changes cut the time needed to build my graph to 10% of the time needed before! As the operations are immutable, the changes should not change the behavior in any way. The choice of lazy val and val is quite arbitrary, but based on where most time was spent during graph building (assuming that spending more time == accessing the field at least once is more probable)

eaplatanios commented 6 years ago

@csaladin94 This will not really work because unfortunately, due to how control flow ops are handled (mainly while loops), the ops are not really immutable. When constructing while loops the inputs and control inputs to an op may change meaning that lazy vals may end up holding incorrect values. Could you send me the code you used for benchmarking? This way I can see if I can improve performance in different ways (maybe through changing the JNI bindings code).

carlo-veezoo commented 6 years ago

I see, that's quite unfortunate. Also, I can't share the original code with you, but the following code exhibits basically the same behavior as I've seen on my code:

import org.platanios.tensorflow.api._

object GraphBuildingBenchmark {
  def main(args: Array[String]): Unit = {
    var a = 1.toTensor.toOutput

    for(i <- 0 until 10000) {
      a = a + a
    }

    val session = Session()
  }
}

(Btw, this code also leads to a StackOverflowError during pruneControlDependencies)

When profiling using the VisualVM sampler, I get the following:

profile

The 4 native code invocations make up for 90% of the time spent. I haven't analyzed where the calls come from yet, so I'll have a look at that.

eaplatanios commented 6 years ago

I was actually afraid of that stack overflow possibility but never tested it. I'll fix that and I think I may have an idea for how to deal with those native calls but let me look into it and respond soon. Thanks a lot for providing that information.

eaplatanios commented 6 years ago

@csaladin94 I made some edits and improved performance a bit hopefully. Could you please tell me whether your benchmark improves after these changes?

eaplatanios commented 6 years ago

FYI, the new artifacts which include these changes should be up in the snapshot repository now. :)

carlo-veezoo commented 6 years ago

@eaplatanios Yes, I get the same speed-up as with the val implementation! Thanks a lot :)

eaplatanios commented 6 years ago

No problem! I'm super glad it worked out. :)