eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 96 forks source link

Serialisation of tensors whose content is greater than 2GB #110

Closed mandar2812 closed 6 years ago

mandar2812 commented 6 years ago

I was training a CNN model(TF GPU) on some image data and it seems the input tensors could not be serialised.

Is there some way around this, because I would like to use data whose size is even larger than 2 GB.

Building the regression model.
2018-06-05 13:32:41.567195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] Adding visible gpu devices: 0
2018-06-05 13:32:41.567562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 370 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0, compute capability: 6.1)

Training the regression model.

2018-06-05 13:32:41.718663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] Adding visible gpu devices: 0
2018-06-05 13:32:41.718929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 370 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0, compute capability: 6.1)
org.platanios.tensorflow.jni.InvalidArgumentException: Cannot serialize tensors whose content is larger than 2GB.
  org.platanios.tensorflow.jni.InvalidArgumentException$.apply(TensorFlowException.scala:99)
  org.platanios.tensorflow.api.tensors.Tensor$.makeProto(Tensor.scala:598)
  org.platanios.tensorflow.api.ops.Basic$class.constant(Basic.scala:62)
  org.platanios.tensorflow.api.ops.Basic$.constant(Basic.scala:1652)
  org.platanios.tensorflow.api.tensors.Tensor.toOutput(Tensor.scala:363)
  org.platanios.tensorflow.api.ops.io.data.Data$$anon$1.flattenedOutputsFromT(Data.scala:134)
  org.platanios.tensorflow.api.ops.io.data.Data$$anon$1.flattenedOutputsFromT(Data.scala:119)
  org.platanios.tensorflow.api.ops.io.data.TensorSlicesDataset.createHandle(TensorSlicesDataset.scala:47)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset$$anonfun$createHandle$1.apply(ZipDataset.scala:55)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset$$anonfun$createHandle$1.apply(ZipDataset.scala:55)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset.createHandle(ZipDataset.scala:55)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset$$anonfun$createHandle$1.apply(ZipDataset.scala:55)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset$$anonfun$createHandle$1.apply(ZipDataset.scala:55)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.ZipDataset.createHandle(ZipDataset.scala:55)
  org.platanios.tensorflow.api.ops.io.data.RepeatDataset$$anonfun$createHandle$1.apply(RepeatDataset.scala:42)
  org.platanios.tensorflow.api.ops.io.data.RepeatDataset$$anonfun$createHandle$1.apply(RepeatDataset.scala:42)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.RepeatDataset.createHandle(RepeatDataset.scala:42)
  org.platanios.tensorflow.api.ops.io.data.ShuffleDataset$$anonfun$createHandle$1.apply(ShuffleDataset.scala:47)
  org.platanios.tensorflow.api.ops.io.data.ShuffleDataset$$anonfun$createHandle$1.apply(ShuffleDataset.scala:47)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.ShuffleDataset.createHandle(ShuffleDataset.scala:47)
  org.platanios.tensorflow.api.ops.io.data.BatchDataset$$anonfun$createHandle$1.apply(BatchDataset.scala:42)
  org.platanios.tensorflow.api.ops.io.data.BatchDataset$$anonfun$createHandle$1.apply(BatchDataset.scala:42)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.BatchDataset.createHandle(BatchDataset.scala:42)
  org.platanios.tensorflow.api.ops.io.data.PrefetchDataset$$anonfun$createHandle$1.apply(PrefetchDataset.scala:41)
  org.platanios.tensorflow.api.ops.io.data.PrefetchDataset$$anonfun$createHandle$1.apply(PrefetchDataset.scala:41)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Op.scala:889)
  org.platanios.tensorflow.api.ops.io.data.PrefetchDataset.createHandle(PrefetchDataset.scala:41)
  org.platanios.tensorflow.api.ops.io.data.Iterator$$anonfun$createInitializer$3.apply(Iterator.scala:71)
  org.platanios.tensorflow.api.ops.io.data.Iterator$$anonfun$createInitializer$3.apply(Iterator.scala:71)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.colocateWith(Op.scala:917)
  org.platanios.tensorflow.api.ops.io.data.Iterator.createInitializer(Iterator.scala:70)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply$mcV$sp(FileBasedEstimator.scala:155)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:137)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:137)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWith(Op.scala:862)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.trainWithHooks(FileBasedEstimator.scala:137)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.train(FileBasedEstimator.scala:86)
  io.github.mandar2812.dynaml.tensorflow.package$dtflearn$$anonfun$6.apply(package.scala:607)
  io.github.mandar2812.dynaml.tensorflow.package$dtflearn$$anonfun$6.apply(package.scala:587)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWith(Op.scala:862)
  org.platanios.tensorflow.api.ops.Op$API$class.createWith(Op.scala:425)
  org.platanios.tensorflow.api.package$tf$.createWith(package.scala:191)
  io.github.mandar2812.dynaml.tensorflow.package$dtflearn$.build_tf_model(package.scala:587)
  io.github.mandar2812.PlasmaML.helios.package$.run_experiment_omni_ext(package.scala:1486)
  ammonite.$file.helios.scripts.omni_ext_ctl$.main(omni_ext_ctl.sc:165)
  ammonite.$sess.cmd1$.<init>(cmd1.sc:1)
  ammonite.$sess.cmd1$.<clinit>(cmd1.sc)

DynaML>
eaplatanios commented 6 years ago

@mandar2812 Unfortunately that is a protobuf limitation in TensorFlow. This serialization code is not part of the Scala API. It directly invokes the native serialization code. You could write a custom serialization method for this purpose.

mandar2812 commented 6 years ago

@eaplatanios Interesting! Then this would this be a limitation for anyone using the Tensorflow python as well as native API assuming they use the protobuf serialiser?

eaplatanios commented 6 years ago

@mandar2812 Yes, that is correct. I think that in both the Scala and the Python API you'd need to use a different serialization method to avoid this issue. I'll close this given that I am using the TensorFlow-provided serialization support.