eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 96 forks source link

Custom OP not working correctly with TensorFlow Scala #146

Closed Spiess closed 5 years ago

Spiess commented 5 years ago

I am using the rasterize_triangles_kernel.so operation from https://github.com/google/tf_mesh_renderer (commit a6403fbb36a71443ecb822e435e5724550d2b52b) in one of my projects.

Running it in Python works without problems:

import tensorflow as tf

class RasterizerTest(tf.test.TestCase):
  def testRasterizer(self):
    rasterizer_module = tf.load_op_library('./rasterize_triangles_kernel.so')
    with self.test_session():
      result = rasterizer_module.rasterize_triangles([[0, 1, 2], [0, 1, 2], [0, 1, 2]], [[0, 1, 2], [0, 1, 2], [0, 1, 2]], 200, 200)
      print(result.triangle_ids.eval())

if __name__ == "__main__":
  tf.test.main()

To get it to even load and register correctly in TensorFlow Scala, I have had to compile it using the TensorFlow nightly build and add the following shape function:

.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
      int imgWidth;
      int imgHeight;
      c->GetAttr("image_width", &imgWidth);
      c->GetAttr("image_height", &imgHeight);
      c->set_output(0, c->MakeShape({imgHeight, imgWidth, 3}));
      c->set_output(1, c->MakeShape({imgHeight, imgWidth}));
      c->set_output(2, c->MakeShape({imgHeight, imgWidth}));
      return Status::OK();
    })

While it still works in Python, in TensorFlow Scala trying to run it crashes with a segfault. I have discovered that this is the case because, while the shapes of the operation inputs that arrive in the C++ code through the OpKernelContext are correct, the input data is completely wrong.

The following is my test code:

package meshrenderer

import org.platanios.tensorflow.api.core.Shape
import org.platanios.tensorflow.api.core.client.FeedMap
import org.platanios.tensorflow.api.{Op, Output, tf, _}
import org.platanios.tensorflow.api.ops.Gradients
import org.platanios.tensorflow.api.tensors.Tensor

object TestKernel {

  def main(args: Array[String]): Unit = {
    org.platanios.tensorflow.jni.TensorFlow.loadOpLibrary("lib/rasterize_triangles_kernel.so")

    val image_width = 227
    val image_height = 227

    val vertices = Tensor(0 until 30).toFloat.reshape(Shape(10, 3))
    val triangles = Tensor(Seq(0 until 15)).reshape(Shape(5, 3))

    val gradientFn: Gradients.GradientFn[Seq[Output[Any]], Seq[Output[Float]], Seq[Output[Any]], Seq[Output[Float]]] = rasterizeTrianglesGrad

    val inputs: Seq[Output[Any]] = Seq(vertices.toOutput, triangles.toOutput)

    val outs: Op[Seq[Output[Any]], Seq[Output[Float]]] = Op.Builder[Seq[Output[Any]], Seq[Output[Float]]](opType = "RasterizeTriangles", "rasterize_triangles", inputs, addAsIndividualInputs = true)
      .setAttribute("image_width", image_width)
      .setAttribute("image_height", image_height)
      .build()

    println(outs.outputsSeq)

    println(triangles.summarize())

    using(Session())(session => {
      val results = session.run(fetches = outs.outputsSeq(1))

      println(results.summarize())
    })
  }
}

The values passed through the OpKernelContext are all 1 for the vertices input and all 1065454216 for triangles in my tests.

Spiess commented 5 years ago

I forgot to mention: I have observed this issue for precompiled TensorFlow Scala 0.4.1 and current TensorFlow Scala compiled with TensorFlow r1.12.

Spiess commented 5 years ago

I was able to fix the issue by compiling the custom operation using tf-nightly-gpu==1.13.0.dev20181121.

Is there any way to tell with which version/nightly a given TensorFlow Scala release has been compiled? Because I had to guess that nightly from the release date.