databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps #189

Closed Liangmp closed 5 years ago

Liangmp commented 5 years ago

I am following the steps in Making Image Classification Simple With Spark Deep Learning, I run codes line by line on ./pyspark --master yarn, codes are shown below:

from sparkdl import readImages
from pyspark.sql.functions import lit

img_dir = "hdfs:///personalities/"

#Read images and Create training & test DataFrames for transfer learning
jobs_df = readImages(img_dir + "/jobs").withColumn("label", lit(1))
zuckerberg_df = readImages(img_dir + "/zuckerberg").withColumn("label", lit(0))
jobs_train, jobs_test = jobs_df.randomSplit([0.6, 0.4])
zuckerberg_train, zuckerberg_test = zuckerberg_df.randomSplit([0.6, 0.4])

#dataframe for training a classification model
train_df = jobs_train.unionAll(zuckerberg_train)

#dataframe for testing the classification model
test_df = jobs_test.unionAll(zuckerberg_test)

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
from sparkdl import DeepImageFeaturizer

featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3")
lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
p = Pipeline(stages=[featurizer, lr])
p_model = p.fit(train_df)

But I am stuck in p_model = p.fit(train_df) with the following error:

...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 132, in fit
    return self._fit(dataset)
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/ml/pipeline.py", line 107, in _fit
    dataset = stage.transform(dataset)
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 173, in transform
    return self._transform(dataset)
  File "/usr/local/lib/python3.6/dist-packages/sparkdl/transformers/named_image.py", line 158, in _transform
    return transformer.transform(dataset)
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 173, in transform
    return self._transform(dataset)
  File "/usr/local/lib/python3.6/dist-packages/sparkdl/transformers/named_image.py", line 221, in _transform
    result = tfTransformer.transform(dataset.withColumn(resizedCol, resizeUdf(inputCol)))
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 173, in transform
    return self._transform(dataset)
  File "/usr/local/lib/python3.6/dist-packages/sparkdl/transformers/tf_image.py", line 137, in _transform
    "image_buffer": "__sdl_image_data"})
  File "/usr/local/lib/python3.6/dist-packages/tensorframes/core.py", line 264, in map_rows
    return _map(fetches, dframe, feed_dict, block=False, trim=None, initial_variables=initial_variables)
  File "/usr/local/lib/python3.6/dist-packages/tensorframes/core.py", line 150, in _map
    builder = _java_api().map_rows(dframe._jdf)
  File "/usr/local/lib/python3.6/dist-packages/tensorframes/core.py", line 34, in _java_api
    return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o190.loadClass.
: java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Can somebody helps me? Thanks a lot.

Liangmp commented 5 years ago

Problem solved after adding --packages argument

./pyspark --master yarn --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11
erkansirin78 commented 5 years ago

No it doesn't

vishalgoel071 commented 3 years ago

Need solution for this ?