Open SleepProgger opened 5 years ago
When solving (although in a very crude way) the invalid opencl kernel generated by plaidml (https://github.com/plaidml/plaidml/issues/322) i now get the same error with tensorflow.keras and keras with the keras backend set to tensorflow, ie:
Traceback (most recent call last):
File "test_ngrapg_tf.py", line 39, in <module>
model.fit(x_train, y_train, epochs=5)
File "/run/media/nope/data/home/nope/workspace/test/fs/ngraph-tf_master/build_cmake/venv-tf-py3/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/run/media/nope/data/home/nope/workspace/test/fs/ngraph-tf_master/build_cmake/venv-tf-py3/lib/python3.5/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/run/media/nope/data/home/nope/workspace/test/fs/ngraph-tf_master/build_cmake/venv-tf-py3/lib/python3.5/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/run/media/nope/data/home/nope/workspace/test/fs/ngraph-tf_master/build_cmake/venv-tf-py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/run/media/nope/data/home/nope/workspace/test/fs/ngraph-tf_master/build_cmake/venv-tf-py3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Caught exception while compiling op_backend: get_shape() must be called on a node with exactly one output ()
[[{{node ngraph_cluster_44}}]]
or a segfault (some times the one, sometimes the other)
I plan to try an ubuntu based distro tomorrow to see if it is in deed manjaro related
Sadly basically same behavior under Mint (Ubuntu LTS based).
I try since some days to get ngraph-tf to run under manjaro and ran into multiple problems. The goal is to use ngraph-tf with the plaidml backend.
I am testing with the following code:
When trying to run it with tensorflow.keras and the ngraph backend set to PLAIDML (
USE_TF_KERAS=1 KERAS_BACKEND="tensorflow" python test_ngrapg_tf.py PLAIDML
) i get a segfault or this stacktrace (sometimes the one, sometimes the other):When trying to run it with keras with the keras backend set to tensorflow (
USE_TF_KERAS=0 KERAS_BACKEND="tensorflow" python test_ngrapg_tf.py PLAIDML
) i reliable get invalid opencl kernels generated by plaidml (see https://github.com/plaidml/plaidml/issues/322)Both versions can execute the prediction step just fine, altho keras with tensorflow backend seem to produce wrong values.
With only tensorflow or plaidml via keras (or in the case of tf also tf.keras) and without ngraph-tf it runs without a problem (
USE_TF_KERAS=1/0 KERAS_BACKEND="tensorflow" python test_ngrapg_tf.py NONE
). Those tests where made with a self build version ofngraph-tf
with and without the--use_prebuilt_tensorflow
parameter.Using the CPU ngraph backend it runs with keras with tensorflow as keras backend and tf.keras altho way slower as just tensorflow-cpu without ngraph in both cases. Additionally when using keras with backend set to tensorflow the results seem to be wrong.
When trying to run it with the ngraph CPU backend via the pypi version of ngraph-tf installed via pip i get an
Illegal instruction
crash with keras->tensorflow and tf.keras.Additional info
I am using python 3.5.5 installed via pyenv.
GPU: Radeon RX 580
When compiling ngraph-tf i need to create a link from
lib64
tolib
in the artifact dir otherwise the ngraph-tf build fails as it expects the lib dir but creates the lib64 dir (not sure if relevant)Sorry for the wall of text, but i really don't know where it goes wrong. Please let me know if additional information are required.