eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
939 stars 95 forks source link

Cannot create a Saver from MetaGraphDef #148

Closed xtordoir closed 5 years ago

xtordoir commented 5 years ago

Hi, I am trying to load a checkpoint from ssd_mobilenet_v2_coco (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md). Am I missing something here or there is some incompatibility between the checkpoint and the library to investigate?

version 0.4.1

val modelMetaGraphPath = s"${home}/data/models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.meta"
  val metaGraphDef = MetaGraphDef.parseFrom(
    new BufferedInputStream(new FileInputStream(new File(modelMetaGraphPath))))

  tf.createWith(graph = Graph()) {
    val session = Session()
    val saver = tf.Saver.fromMetaGraphDef(metaGraphDef = metaGraphDef)
[error] (run-main-0) java.lang.IllegalArgumentException: Data type C value '101' is not recognized in Scala (TensorFlow version 1.12.0-rc0).
[error] java.lang.IllegalArgumentException: Data type C value '101' is not recognized in Scala (TensorFlow version 1.12.0-rc0).
[error]         at org.platanios.tensorflow.api.core.types.DataType$.fromCValue(DataType.scala:157)
xtordoir commented 5 years ago

There are some operations with is_ref: true in these metaGraphDef, I bet these are where issue rise. It looks like the REF tensors are supported in the python API, and they are commented as not to be used in the protobuf file:

https://github.com/tensorflow/tensorflow/blob/aa9a1127a894f66eaffe3ee60191a39ffa6cc66d/tensorflow/core/framework/types.proto#L42

Continuing to look into it, but any suggestion is welcome: Any way I can convert the metagraph before creating the saver? Should references be supported in the scala API?

eaplatanios commented 5 years ago

@xtordoir Yes, that is the problem. Unfortunately in TensorFlow there are two ways to represent variables: reference variables and resource variables. In the TF Python API 1.xx, reference variables are the default (you can use resource variables by setting the use_resource argument to True when creating them). Resource variables were developed later and are supposed to replace reference variables in TF 2. In fact, they were supposed to replace them a while ago. For this reason, and the fact that resource variables offer some benefits, I decided to go with resource variables in the Scala API and avoid the confusion between the two. Unfortunately this also means that reference variables are not currently supported in TF Scala, and I also don't plan to add support for them since they're on their way out and, in my opinion, can only result to confusion. Please do tell me if you feel there are important reasons for which to support them and I might reconsider this.

Regarding loading these checkpoints, I am not sure if there is a way to maybe automatically convert them to use resource variables instead of reference ones, but I can look into it during the weekend.

Just to clarify, is the model.ckpt.meta file in binary format, or human-readable text format?

xtordoir commented 5 years ago

Thank you!

The meta file is binary, not text

There is no need to go on the path to support references variables in the TF Scala. And short term, I can always convert the model with a python script until there is a solution to read these meta files from Scala. I am investigating also working with models saved using the tensorflow hub library. I'll keep you posted on progress there as well.

sbrunk commented 5 years ago

If you just need the inference model, loading the frozen inference graph used to work for me (albeit with an older version of the model and TF Scala), like this:

val modelDir = "ssd_inception_v2_coco_2017_11_17"
val graphDef = GraphDef.parseFrom(
  new BufferedInputStream(
    new FileInputStream(new File(new File("models", modelDir), "frozen_inference_graph.pb"))))
val graph = Graph.fromGraphDef(graphDef)
xtordoir commented 5 years ago

Yes, thanks @sbrunk, it works with frozen graphs. I try to get a broader view of where the snake bites me, so as to have guidelines on how to save models and training sessions with TF scala compatibility.

eaplatanios commented 5 years ago

@sbrunk Thanks for the suggestion! @xtordoir putting together some rough guidelines would be great. I haven't really used prebuilt models in Python and I'm not aware of the most common issues.

I'll close this issue now since it's resolved. Feel free to reopen if there are still problems.