Closed carlo-veezoo closed 6 years ago
@csaladin94 This will not really work because unfortunately, due to how control flow ops are handled (mainly while loops), the ops are not really immutable. When constructing while loops the inputs and control inputs to an op may change meaning that lazy vals may end up holding incorrect values. Could you send me the code you used for benchmarking? This way I can see if I can improve performance in different ways (maybe through changing the JNI bindings code).
I see, that's quite unfortunate. Also, I can't share the original code with you, but the following code exhibits basically the same behavior as I've seen on my code:
import org.platanios.tensorflow.api._
object GraphBuildingBenchmark {
def main(args: Array[String]): Unit = {
var a = 1.toTensor.toOutput
for(i <- 0 until 10000) {
a = a + a
}
val session = Session()
}
}
(Btw, this code also leads to a StackOverflowError during pruneControlDependencies)
When profiling using the VisualVM sampler, I get the following:
The 4 native code invocations make up for 90% of the time spent. I haven't analyzed where the calls come from yet, so I'll have a look at that.
I was actually afraid of that stack overflow possibility but never tested it. I'll fix that and I think I may have an idea for how to deal with those native calls but let me look into it and respond soon. Thanks a lot for providing that information.
@csaladin94 I made some edits and improved performance a bit hopefully. Could you please tell me whether your benchmark improves after these changes?
FYI, the new artifacts which include these changes should be up in the snapshot repository now. :)
@eaplatanios Yes, I get the same speed-up as with the val
implementation!
Thanks a lot :)
No problem! I'm super glad it worked out. :)
I changed some
def
s inOp
to(lazy) val
. These simple changes cut the time needed to build my graph to 10% of the time needed before! As the operations are immutable, the changes should not change the behavior in any way. The choice oflazy val
andval
is quite arbitrary, but based on where most time was spent during graph building (assuming that spending more time == accessing the field at least once is more probable)