cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

minist #5

Closed sodaling closed 7 years ago

sodaling commented 7 years ago

How to run a minist example,hope for your answer.

JoeriHermans commented 7 years ago

Hi Sodaling

There is an MNIST example on the development branch. I will push it to master by the end of the week. Some new features will be added which will significantly increase the performance. Anyway, if you would like to test an MNIST example, just clone the development branch (or wait until next week :)).

I hope this helped.

Kind regards,

Joeri

sodaling commented 7 years ago

Thank you very much for your patience. When I run your mnist code, an "error" occurs. “Py4JJavaError: An error occurred while calling o484.load. : java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv.”

sodaling commented 7 years ago

Besides, I had run your code in the Jupyter.

JoeriHermans commented 7 years ago

Yes, the examples are meant to be run as Jupyter notebooks. I will add this to the README in the next release. Using Jupyter notebooks, I can give some additional information to the users :)

Are you sure you are executing the following before creating a Spark context?

import os

# Use the DataBricks CSV reader, this has some nice functionality regarding invalid values.
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-csv_2.10:1.4.0 pyspark-shell'

Joeri

sodaling commented 7 years ago

This is the point that why i commit this issue... I run you mnist example then it print ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at.... I think it's because there has been a 'sc' in pyspark before I set the PYSPARK_SUBMIT_ARGS and create a new SparkContext.

I modified my ~/.bashrc to

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"

and run /usr/local/spark/bin/pyspark directly. Am I running jupyter with pyspark wrongly? How did you run jupyter with pyspark? Sorry, but I'm really a newbie to pyspark and jupyter.

JoeriHermans commented 7 years ago

Personally, I do it like this:

  1. I start a Jupyter notebook server on one of the cluster nodes* by executing the following command:
    jupyter notebook --ip='*' --no-browser
  2. Then I open a browser window on my local machine and go to http://[host]:8888. This will direct my to the directory where I started Jupyter. Afterwards, I just navigate to my project folder, and just run the notebooks without any further configuration.

I think if you would execute the steps like I described them here, you should be fine. The thing is, if you start PySpark, it will automatically create a SparkContext. So, if you would need, for example, the Databricks CSV reader, there is no way to ask this (since the SparkContext has already been started).

I hope this helps. If it doesn't, let me know.

Joeri

sodaling commented 7 years ago

Your description is very clear.But follow your guideline,it printed that i need to import pyspark. I think i miss some configuration of Jupyter.Sorry to bother you.Just a newbie interesting your jobs.

JoeriHermans commented 7 years ago

Could you paste the error message here? Because this seems pretty strange to me.

No problem, we were all there once :)

sodaling commented 7 years ago

Whatever ,i ran your mnist example in spark locally with "/usr/local/spark/bin/spark-submit --packages com.databricks:spark-csv_2.10:1.4.0 /home/hpcc/test/minist.py". I think it ran sucessfully.But it stuck on training step:"16/12/12 21:25:29 INFO PythonRunner: Times: total = 60030, boot = 6, init = 818, finish = 59206 16/12/12 21:25:30 INFO PythonRunner: Times: total = 60577, boot = 14, init = 1434, finish = 59129 16/12/12 21:25:32 INFO PythonRunner: Times: total = 60169, boot = 13, init = 1624, finish = 58532 16/12/12 21:25:32 INFO MemoryStore: Block rdd_37_0 stored as values in memory (estimated size 118.3 MB, free 123.7 MB) 16/12/12 21:25:32 INFO BlockManagerInfo: Added rdd_37_0 in memory on localhost:34718 (size: 118.3 MB, free: 388.7 MB) 16/12/12 21:25:32 INFO GeneratePredicate: Code generated in 19.821608 ms 16/12/12 21:25:32 INFO GenerateColumnAccessor: Code generated in 27.78954 ms Using Theano backend. WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string. ". I think it is because it runing too slowly.Can i see it's verbose like keras?

JoeriHermans commented 7 years ago

Ah! You are right, the examples run fine. However, you should install the gcc / g++ compiler on your machines. Because Theano is compiling the Keras model to C++.

sodaling commented 7 years ago

Thanks for your long long reply!I have learned a lots from your jobs.Your code is very clear and fantastic. Beside,can i see verbose while i am training?

JoeriHermans commented 7 years ago

No problem!

Sadly no, but I'm planning to add this in a future release. If it is ok for you I will mark this issue as closed.