eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 95 forks source link

Integration with Tensorboard #79

Closed rikvdkleij closed 6 years ago

rikvdkleij commented 6 years ago

How can I integrate with Tensorboard when I do not have a layer, so I can not create a model to create an estimator?

My code has a similar structure as the LinearRegression example.

eaplatanios commented 6 years ago

You can always use summary ops as you would with the Python API, along with summary writers and then manually launch TensorBoard from the relevant log directory. Otherwise, you can extend the Layer class yourself (all you need to implement is the forward method which should look pretty much like your current code, just placed inside that method) and train it using an estimator. Does that help? :)

rikvdkleij commented 6 years ago

Thanks! I will try but it would be nice if you would have an example for the LinearRegression example.

rikvdkleij commented 6 years ago

The Python API for run works differently. When I pass a summary_op I would expect a Summary back. In Scala API run returns Seq[Tensor]

eaplatanios commented 6 years ago

In both the Python and Scala APIs, if for example you call Session.run on a single scalar summary op, you should get a Tensor back, containing a single string, which is a Summary Protobuf message. Could you please share a minimal Python and Scala example where they work differently? There may be a bug and that would help me find it. Thanks!

rikvdkleij commented 6 years ago

I could not got it working. Also some other problem in exporting the trained model result. So I decided to rewrite the autoencoder in which a can use a model and estimator. But of course I ran into problems :-), first creating a model expects it's supervised. I try to workaround that by zipping the dataset. But now I run into problem with the sequenceLoss:

Exception in thread "main" org.platanios.tensorflow.api.core.package$exception$InvalidShapeException: 'logits' must have shape [batchSize, sequenceLength, numClasses], but had: [?, 4].

In both the Python and Scala APIs, if for example you call Session.run on a single scalar summary op, you should get a Tensor back, containing a single string, which is a Summary Protobuf message.

Yes, I used that to log the loss. Later on I changed it to:

val Seq(l, s) = session.run(fetches = Seq(loss, summaryOp.get), targets = train, feeds = Map(inputLayer -> data))
summaryWriter.writeSummary(s)

That did not work.

eaplatanios commented 6 years ago

@rikvdkleij Regarding the second part of your answer which relates to this issue, you have to extract the summary protobuf string from the returned tensor object, like this:

summaryWriter.writeSummaryString(s.scalar.asInstanceOf[String])

In general, to get a better idea of how those classes are supposed to be used, you can look into the implementation of the estimators and the summary writer hook, in my library.

In Python you don't need to do anything like that because it's dynamically typed. I could make writeSummaryString accept a tensor as input directly, but that's more hacky in my opinion and does not represent what argument writeSummaryString is supposed to accept. If you have an idea for a better way to do this, please let me know. :)

Regarding your other comment, estimators do not expect models to be supervised. The Model object contains multiple constructors that allow for unsupervised models too. I would have to see a minimal reproducible code example to help you with this.

rikvdkleij commented 6 years ago

The Model object contains multiple constructors that allow for unsupervised models too.

If you go to here https://github.com/eaplatanios/tensorflow_scala/blob/c07d2b810c87a26eb60611363e026c78d58b690b/api/src/main/scala/org/platanios/tensorflow/api/learn/Model.scala#L57 you see that each apply creates supervised model.

Tomorrow I will paste some code.

rikvdkleij commented 6 years ago

About the original issue, this is the code I currently have for the implementation without model:

   val session = Session()
    val summariesDir = Paths.get("./autoencoder")
    tf.summary.scalar("loss", loss)
    val summaryOp = tf.summary.mergeAll()
    val summaryWriter = tf.summary.FileWriter(summariesDir, session.graph)
    session.run(targets = tf.globalVariablesInitializer())

    for (i <- 1 to epoch) {
      val Seq(l, s) = session.run(fetches = Seq(loss, summaryOp.get), targets = train, feeds = Map(inputLayer -> data))
      summaryWriter.writeSummaryString(s.scalar.asInstanceOf[String])
      println(s"Epoch $i: loss ${l.scalar} ")
    }

    val s = tf.saver()
    s.save(session, Paths.get("./autoencoder/forbigdl.chkp"))
    summaryWriter.writeGraph(session.graph)

It does write the graph but not the loss for every interation. Any idea? Btw, I hope you can find better solution in Scala for this :-):

s.scalar.asInstanceOf[String]
rikvdkleij commented 6 years ago

Code to create Autoencoder with model: https://gist.github.com/rikvdkleij/1bb0a3ec1365fa2868952716f4df3b70

Currently it gives this exception:

Exception in thread "main" org.platanios.tensorflow.api.core.package$exception$InvalidShapeException: 'logits' must have shape [batchSize, sequenceLength, numClasses], but had: [?, 4].

I think it has something to with workaround of going from Unsupervised to Supervised model.

Thank you very much for helping!

eaplatanios commented 6 years ago

@rikvdkleij I just added a constructor for unsupervised models that I had forgotten (I'm sorry about that) and modified the model API a bit. I'm uploading artifacts now. The learn API is still very much work-in-progress and I improve it based on use cases I encounter. Let me know if you have any feedback on what could be done better and make your life easier. :)

Regarding the s.scalar.asInstanceOf[String] issue, I have given this a lot of thought in the past and tried various things, but I couldn't find a better way. Tensors in the TensorFlow C++ API are also not strongly-typed (their data type is not known at compile-time). The main reason for that is that for certain ops it's hard to specify what the output data type will be given the data types of the input tensors. The only solution I see would be to specify all the data type relations when you construct the ops, but that would be too tedious as there are thousands of ops (and you'd probably need lots of type traits to do that too). I think the current solution is ok, in that you shouldn't generally have to deal with extracting values from tensors in that way. If you can think of a nicer way for dealing with this, please let me know. I also really dislike having to do that to extract element values.

The issue with the last code sample you provided is that you're using SequenceLoss when you should probably be using L2Loss. The sequence loss is intended for data with a sequential nature (e.g., time series).

Regarding the issue with summaries, could you try adding summaryWriter.flush() right after summaryWriter.writeSummaryString(s.scalar.asInstanceOf[String]) and see what happens?

rikvdkleij commented 6 years ago

The issue with the last code sample you provided is that you're using SequenceLoss when you should probably be using L2Loss. The sequence loss is intended for data with a sequential nature (e.g., time series).

Yes, that helps! I'm now running in next problem :-)

Regarding the issue with summaries, could you try adding summaryWriter.flush() right after summaryWriter.writeSummaryString(s.scalar.asInstanceOf[String]) and see what happens?

No, that does not help.

eaplatanios commented 6 years ago

@rikvdkleij so what does the summaries directly contain if you remove thewriteGraph line?

rikvdkleij commented 6 years ago

Just a spike at 0.

eaplatanios commented 6 years ago

Oh I see. writeSummaryString takes the step for that summary as a second argument. You should add i as the second argument in that method call. :)

rikvdkleij commented 6 years ago

Yes, that works! Thanks! A default value of 0 is tricky :-)

eaplatanios commented 6 years ago

No problem! And yeah good point. I’ll remove those default values. :)