eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
937 stars 95 forks source link

Slicing with a single integer behaves different than in numpy #21

Closed sbrunk closed 6 years ago

sbrunk commented 6 years ago

I stumbled upon this while working on #20.

In numpy, when slicing with a single integer, the corresponding axis is removed, reducing the dimensionality of the resulting tensor.

>>> import numpy as np
>>> x = np.full((2,2), 1)
>>> x.shape
(2, 2)
>>> x[0]
array([1, 1])
>>> x[0].shape
(2,)

if you wanted to keep the axis you'd have to to x[0:1] instead.

In your tensor slicing keeps the axis (with a value of one). So you have to reshape in order to remove it.

scala> import org.platanios.tensorflow.api._
import org.platanios.tensorflow.api._

scala> val t = tf.Tensor.fill(tf.INT32, tf.shape(2,2))(1)
t: org.platanios.tensorflow.api.tensors.Tensor = INT32[2, 2]

scala> t.slice(0)
res1: org.platanios.tensorflow.api.tensors.Tensor = INT32[1, 2]

scala> t.slice(0).reshape(tf.shape(2))
res2: org.platanios.tensorflow.api.tensors.Tensor = INT32[2]

I'm just wondering if this difference is on purpose, or if it should be made consistent with numpy.

eaplatanios commented 6 years ago

@sbrunk Sorry for the very late response but I was traveling both for work and for vacations, and I was also working on a major overhaul of the tensors API in my library. I added support for automatically generating JNI bindings for eagerly executing TensorFlow ops from Scala. It's still a work in progress, but to give you an example, your code from above would be transformed to:

scala> import org.platanios.tensorflow.api._
import org.platanios.tensorflow.api._

scala> val t = Tensor.fill(INT32, Shape(2, 2))(1)
00:38:10.138 [run-main-1] INFO  TensorFlow Native - Failed to load the TensorFlow native library with error: no tensorflow_jni in java.library.path. Attempting to load it as a resource.
00:38:10.141 [run-main-1] INFO  TensorFlow Native - Extracting TensorFlow native library to /var/folders/rw/lqrc8nk52kqcc4_zq3c2b6dh0000gn/T/libtensorflow_jni228284366582991515.dylib.
00:38:10.143 [run-main-1] INFO  TensorFlow Native - Copied 409624 bytes to /var/folders/rw/lqrc8nk52kqcc4_zq3c2b6dh0000gn/T/libtensorflow_jni228284366582991515.dylib.
00:38:10.145 [run-main-1] INFO  TensorFlow Native - Loaded the TensorFlow native library as a resource.
t: org.platanios.tensorflow.api.tensors.Tensor = INT32[2, 2]

scala> t(0)
res0: org.platanios.tensorflow.api.tensors.Tensor = INT32[2]

The logging can also be controlled through the log4j configuration file.

There are currently some issues related to string tensors that have to do with a potential bug in the TensorFlow native library, but I should soon have them resolved. I'm also looking into how to automatically generate documentation for both symbolic ops and eager tensor execution ops, thus avoiding duplicating the documentation in two places.

Please let me know your thoughts on the new structure of the tensor API. :)

sbrunk commented 6 years ago

I was on vacation myself so I didn't have the time to try anything out yet.

Automatically generated numpy like ops sounds great. I'll give it a try soon.

eaplatanios commented 6 years ago

@sbrunk By the way, regarding the learning API that I had mentioned at some point earlier, I have pushed an early draft I'm working on in the org.platanios.tensorflow.api.learn package. It follows closely the TensorFlow Estimators API, but cleans it up and adds strong typing at various places. I plan to add some tutorials soon, but I'm delayed it because I'm actively working on it now and some things might change. On the plus side, it also adds some building blocks for supporting distributed training. :)

eaplatanios commented 6 years ago

Given that this issue has been resolved, I'll close this. Please feel free to re-open if you find any more indexing issues or incompatibilities with numpy-style indexing. :)