databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
2k stars 494 forks source link

java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging #74

Open ankamv opened 7 years ago

ankamv commented 7 years ago

I am using pyspark shell using below command on EMR cluster (Spark 2.1.1 and tried Python versions 2.7.12 and Anacoda Python 3.5.4)

pyspark --master local[2] --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11,databricks:tensorframes:0.2.9-s_2.11 --jars /home/hadoop/scala-logging-slf4j_2.11-2.1.2.jar

Trying to run logistic regression on a training data set. featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3") lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label") p = Pipeline(stages=[featurizer, lr]) p_model = p.fit(train_df)

and getting below error

p_model = p.fit(train_df) 2017-10-30 18:14:48.155851: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-30 18:14:48.155872: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-30 18:14:48.155882: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-10-30 18:14:48.155891: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-30 18:14:48.155899: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. INFO:tensorflow:Froze 376 variables. Converted 376 variables to const ops. Using TensorFlow backend. Using TensorFlow backend. INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Traceback (most recent call last): File "", line 1, in File "/usr/lib/spark/python/pyspark/ml/base.py", line 64, in fit return self._fit(dataset) File "/usr/lib/spark/python/pyspark/ml/pipeline.py", line 106, in _fit dataset = stage.transform(dataset) File "/usr/lib/spark/python/pyspark/ml/base.py", line 105, in transform return self._transform(dataset) File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_spark-deep-learning-0.1.0-spark2.1-s_2.11.jar/sparkdl/transformers/named_image.py", line 159, in _transform File "/usr/lib/spark/python/pyspark/ml/base.py", line 105, in transform return self._transform(dataset) File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_spark-deep-learning-0.1.0-spark2.1-s_2.11.jar/sparkdl/transformers/named_image.py", line 222, in _transform File "/usr/lib/spark/python/pyspark/ml/base.py", line 105, in transform return self._transform(dataset) File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_spark-deep-learning-0.1.0-spark2.1-s_2.11.jar/sparkdl/transformers/tf_image.py", line 142, in _transform File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 264, in map_rows File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 150, in _map File "/mnt/tmp/spark-0449092f-236e-4f87-910f-2381c72b82c1/userFiles-af2cdd0f-fd63-4845-8e3f-5b6220423978/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 34, in _java_api File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco return f(*a, kw) File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o177.loadClass. : java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging** at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.typesafe.scalalogging.slf4j.LazyLogging at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 36 more

phi-dbq commented 7 years ago

Hi @ankamv , thank you for reporting the issue. Can you try doing the following in your pyspark script? --jars <YOUR_LIB_ROOT>/scala-logging-slf4j_2.11-2.1.2.jar,<YOUR_LIB_ROOT>/scala-logging-api_2.11-2.1.2.jar

Also related, for testing purposes, please limit the size of your train_df if you are running it locally.

ankamv commented 7 years ago

Same issue again. I have 6 images of 1mb size in my train_df.

I see following jars from sc._conf.getAll() command.

('spark.jars', 'file:/home/hadoop/scala-logging-slf4j_2.11-2.1.2.jar,file:/home/hadoop/scala-logging-api_2.11-2.1.2.jar,file:/home/hadoop/.ivy2/jars/databricks_spark-deep-learning-0.1.0-spark2.1-s_2.11.jar,file:/home/hadoop/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar,file:/home/hadoop/.ivy2/jars/com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar,file:/home/hadoop/.ivy2/jars/com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar,file:/home/hadoop/.ivy2/jars/org.slf4j_slf4j-api-1.7.7.jar,file:/home/hadoop/.ivy2/jars/org.apache.commons_commons-proxy-1.0.jar,file:/home/hadoop/.ivy2/jars/org.scalactic_scalactic_2.11-3.0.0.jar,file:/home/hadoop/.ivy2/jars/org.apache.commons_commons-lang3-3.4.jar,file:/home/hadoop/.ivy2/jars/org.tensorflow_tensorflow-1.3.0.jar,file:/home/hadoop/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar,file:/home/hadoop/.ivy2/jars/org.tensorflow_libtensorflow-1.3.0.jar,file:/home/hadoop/.ivy2/jars/org.tensorflow_libtensorflow_jni-1.3.0.jar')

I tried adding two logging jars in --py-files but did not help.

phi-dbq commented 7 years ago

Hi @ankamv Can you try running your example locally on your dev box or laptop to see if it works?

balkon16 commented 4 years ago

Hi @phi-dbq ,

I have the same problem as @ankamv . Tried starting PySpark with the following command $SPARK_HOME/bin/pyspark --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11,databricks:tensorframes:0.8.2-s_2.11 --jars scala-logging_2.12-3.9.2.jar,scala-logging-slf4j_2.11-2.1.2.jar

but got the same error as in the original question.