cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

jupyter kernel dies when running /examples/workflow.ipynb #74

Closed reedv closed 6 years ago

reedv commented 6 years ago

System: OS: CentOS7 spark version: 2.1.0 py4j version: py4j-0.10.4 installed via pip install -e ., see https://github.com/cerndb/dist-keras#git--pip

When running the example workflow, the jupyter kernel reports it has died and restarts automatically whenever interpreter gets to the section

from keras.optimizers import *
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation

or

from distkeras.trainers import *
from distkeras.predictors import *
from distkeras.transformers import *
from distkeras.evaluators import *
from distkeras.utils import *

Looking at the jupyter notebook server output during this time shows not error messages (only that the kernel is restarting).

reedv commented 6 years ago

Running (keras-dist_test) ➜ examples git:(master) ✗ jupyter nbconvert --to python workflow.ipynb to get a .py file for running the workflow example outsided of jupyter notebook. The running the resulting script shows the error:

(keras-dist_test) ➜  examples git:(master) ✗ python workflow.py
Using TensorFlow backend.
[1]    333 illegal hardware instruction (core dumped)  python workflow.py

when importing keras.

reedv commented 6 years ago

Following the solutions desccribed here (https://stackoverflow.com/a/49388337/8236733) and here (https://github.com/tensorflow/tensorflow/issues/17411#issuecomment-370260582) worked for me.

pip install tensorflow=1.5

Still wondering if anything else can be done that avoids downgrading. Will tensorflow 1.5 still be enough to use all of the functionality of dist-keras?