eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 96 forks source link

Memory leak due to Graph and Op not being reclaimed #161

Open huynhjl opened 5 years ago

huynhjl commented 5 years ago

Hi Anthony,

The following code leaks Graph and Op objects in my environment:

  def test(): Unit = {
    val graph = Graph()
    val session: Session = Session(graph)
    try {
      for (_ <- 0 until 1024 * 10) tf.createWith(graph)(tf.placeholder[Float](Shape(-1, 2, 2)))
    } finally {
      session.close()
      graph.close()
    }
  }

If I call the test method repeatedly, objects will not be garbage collected. When looking into VisualVM it shows the GC root going through org.platanios.tensorflow.api.utilities.Disposer.records

Here is an example that should compile: https://gist.github.com/huynhjl/00a9ee6958f1b0143b701eb7b2563005

Let me know if I'm doing anything wrong.

huynhjl commented 5 years ago

As far as I can tell by looking at a memory dump, it looks like Disposer.records indirectly holds a reference to the graph preventing it from being garbage collected. This is because Session.apply adds a closeFn function to graph.nativeHandleWrapper.preCleanupFunctions for the graph to clean up the session and close the graph.reference but that in itself prevents garbage collection.

mandar2812 commented 5 years ago

@eaplatanios I think this issue is the most relevant when considering large scale experimentation/training and hyper-parameter tuning using TF_Scala.

Currently implementations such as TunableTFModel in the DynaML API rely on graph.close() to free up resources.

Let us know if there is any way I, @huynhjl or others can help in resolving this. Although my understanding of the codebase is still a bit high level.

cc @sbrunk @lucataglia @DirkToewe

DirkToewe commented 4 years ago

I'm going to look into it.

eaplatanios commented 4 years ago

I'm sorry I've been off TF Scala for a while, working on other projects. @mandar2812 @DirkToewe @sbrunk if you're interested, we could have a conference call at some point to help you understand the codebase at a deeper level. Just let me know and we can plan it.

mandar2812 commented 4 years ago

@eaplatanios I would love that! Maybe we should make a doodle and fix a time thats okay for all of us? What do you think @sbrunk @DirkToewe ?

DirkToewe commented 4 years ago

A tour of the project would be greatly appreciated! I just need like two days to take a look at the code again (It's been a while) so I can ask better questions. There is an unofficial Tensorflow(JS) Discord server that we could use to coordinate and talk.

sbrunk commented 4 years ago

I've been a bit disconnected from TF Scala since I left academia but I'd still be interested in joining a call about the codebase. I'm also super interested in what you think about Swift for TF since I've seen you've worked with it too :)

eaplatanios commented 4 years ago

Sounds good to me! And yes, I've been working on Swift for TF for quite some time now and would also be happy to talk about that. :) Does someone want to coordinate this? A doodle poll may be a good start. I'm sorry but I've been super busy lately.

mandar2812 commented 4 years ago

@eaplatanios @sbrunk @DirkToewe Ill set up a doodle poll this weekend.

eaplatanios commented 4 years ago

@mandar2812 just a gentle ping about the poll. We can also schedule it informally here. My schedule is quite flexible over the next week.

mandar2812 commented 4 years ago

@eaplatanios sorry for this huge delay in setting up the doodle :D. Im finishing my thesis next week so I would prefer sometime in the last 10 days of August. Is that okay for you guys?

sbrunk commented 4 years ago

Fine with me.