priyeshkap commented 6 years ago

I am experiencing errors whilst trying to setup MLeap similar to that reported in: https://github.com/combust/mleap/issues/172 which is now marked as closed.

I am trying to run the simple spark example: http://mleap-docs.combust.ml/py-spark/ using an AWS EMR cluster. After logging into the master node I run this shell script to install the necessary packages:

sudo pip install --upgrade pip
cd /usr/local/bin 
sudo pip install ipython
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt
sudo yum install git

cd ~
git clone https://github.com/combust/mleap.git
cd mleap
git submodule init
git submodule update
sbt compile
sbt test

cd /usr/local/bin 
sudo pip install mleap
cd ~

export PYSPARK_DRIVER_PYTHON=ipython
pyspark --packages ml.combust.mleap:mleap-spark_2.11:0.8.1

Then I run the following code from the simple tutorial:

import mleap.pyspark
from mleap.pyspark.spark_support import SimpleSparkSerializer

from pyspark.ml.feature import VectorAssembler, StandardScaler, OneHotEncoder, StringIndexer
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import Row

l = [('Alice', 1), ('Bob', 2)]
rdd = sc.parallelize(l)
Person = Row('name', 'age')
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
df2.collect()

string_indexer = StringIndexer(inputCol='name', outputCol='name_string_index')
feature_assembler = VectorAssembler(inputCols=[string_indexer.getOutputCol()], outputCol="features")
feature_pipeline = [string_indexer, feature_assembler]
featurePipeline = Pipeline(stages=feature_pipeline)
featurePipeline.fit(df2)
featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip")

However I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-9bed2dd48ce6> in <module>()
     21 featurePipeline = Pipeline(stages=feature_pipeline)
     22 featurePipeline.fit(df2)
---> 23 featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip")

AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'

This error has been raised in other issues and a common solution is to check that the mleap import statements are executed first. I have ensured that this is the case, however I am still unable to run this code. I would be grateful for any advise to resolve this.

dvaldivia commented 6 years ago

I had the same issue on #172 but it seem that if you change your last line to this

featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip", featurePipeline.transform(df2))

it should work, seems like documentation has not been updated to reflect this

ancasarb commented 6 years ago

@dvaldivia @priyeshkap thanks for raising this issue, once https://github.com/combust/mleap-docs/pull/11 has been merged, the documentation should be up to date.

hollinwilkins commented 6 years ago

@dvaldivia @ancasarb Merged and published

nathanaelmouterde commented 6 years ago

@dvaldivia Hey, I have the same error as @priyeshkap. But it does not look like the way to call serializeToBundle is wrong, it is more that the featurePipeline object that is a Pipeline does not have any function called serializeToBundle. From there, we can't call it. I've tried with both syntax, none of them work.

Any other suggestions?

As we are still referencing to the pyspark object, I guess a part of the code should be altered by mleap to reference this serializeToBundle function. But it looks like it is not the case here.

robperc commented 6 years ago

I've tried both the approach detailed in the documentation and that detailed by @dvaldivia and I'm still getting the AttributeError: 'Pipeline' object has no attribute 'serializeToBundle' error as well. Screenshot attached.

dvaldivia commented 6 years ago

@nathanaelmouterde @robperc make sure you are importing mleap before pyspark

import mleap.pyspark
from mleap.pyspark.spark_support import SimpleSparkSerializer

nathanaelmouterde commented 6 years ago

@dvaldivia Yes those lines are already the two first lines of my script

cappaberra commented 6 years ago

@nathanaelmouterde What version of Spark are you running? I encountered this issue (https://github.com/combust/mleap/issues/363) ... I downgraded from v2.3 to v2.2 and your stated issue no longer was a problem. ~~It seems MLeap isn't ready for Spark v2.3 yet. :)~~ As of MLeap v0.10.0, it supports Spark 2.3.

lie-yan commented 6 years ago

I encountered a similar problem. The error message is as follows.

py4j.protocol.Py4JJavaError: An error occurred while calling o94.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
    at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)
    at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: resource.package$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 13 more

When I run my script, an exception is raised at the statement:

fitted_pipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip",fitted_pipeline.transform(df2))

Here is my script.

import mleap.pyspark

from mleap.pyspark.spark_support import SimpleSparkSerializer

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.feature import StandardScaler
from pyspark.ml.feature import OneHotEncoder
from pyspark.ml.feature import StringIndexer
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import Row, SparkSession
from pyspark import SparkContext
from pprint import pprint

spark = SparkSession \
    .builder \
    .getOrCreate()

sc = spark.sparkContext

l = [('Alice', 10), ('Bob', 12), ('Alice', 13)]
rdd = sc.parallelize(l)

Person = Row('name', 'age')

person = rdd.map(lambda r: Person(*r))

df2 = spark.createDataFrame(person)

string_indexer = StringIndexer(inputCol='name', outputCol='name_string_index')

pprint(string_indexer.getOutputCol())

feature_assembler = VectorAssembler(inputCols=[string_indexer.getOutputCol()],
                                    outputCol='features')

feature_pipeline = Pipeline(stages=[string_indexer, feature_assembler])
fitted_pipeline = feature_pipeline.fit(df2)

fitted_pipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip",
                                  fitted_pipeline.transform(df2))

priyeshkap commented 6 years ago

@cappaberra thanks for your comment. I'm not getting this error anymore with MLeap v0.10.0. @nathanaelmouterde and I are using Spark v2.3.

RaghavendraSingh commented 5 years ago

@lie-yan I am getting the same error, were you able to find a resolution for it? I am running with mleap v0.12.0 with spark 2.3.1

File "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1539093246812_0349/container_1539093246812_0349_01_000001/mleap.zip/mleap/pyspark/spark_support.py", line 25, in serializeToBundle
  File "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1539093246812_0349/container_1539093246812_0349_01_000001/mleap.zip/mleap/pyspark/spark_support.py", line 42, in serializeToBundle
  File "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1539093246812_0349/container_1539093246812_0349_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1539093246812_0349/container_1539093246812_0349_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1539093246812_0349/container_1539093246812_0349_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o121.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: resource.package$
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

I tried the script you posted above as well - failing with the same error

yairdata commented 5 years ago

*UPDATE the reason for the below error was because of missing jar in the spark.jars.packages configuration: ml.combust.mleap:mleap-spark_2.11:0.13.0 BUT - the model can be exported only with model.serializeToBundle("file:/mnt/mleap-example", sparkTransformed) AND NOT: model.serializeToBundle("jar:file:/tmp/20news_pipeline-json.zip", sparkTransformed) that should work according to the example , wonder why.....

i get similar problem, using mleap version 0.13.0 , pyspark 2.4.0 how can i point to the configuration file , there is no evidance of pyspark need to supply mleap configuration file in docs.... BTW - i tried also using pyspark 2.3.0 with 0.13.0 mleap version , like specified in docs , but the same error is happening.

import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSerializer from pyspark.ml import Pipeline from pyspark.ml.classification import GBTClassifier from pyspark.sql import SparkSession from pyspark.ml.feature import StringIndexer, FeatureHasher, StandardScaler, VectorAssembler,OneHotEncoderEstimator from pyspark.ml import Transformer , Estimator from pyspark.sql.functions import when

spark = SparkSession.builder.appName('yair-GBT')\ .config('spark.jars.packages',"ml.combust.mleap:mleap-spark-base_2.11:0.13.0,ml.combust.mleap:mleap-runtime_2.11:0.13.0").getOrCreate()

spark = SparkSession.builder.appName('GBT')\ .config('spark.jars.packages',"ml.combust.mleap:mleap-spark-base_2.11:0.13.0,ml.combust.mleap:mleap-runtime_2.11:0.13.0").getOrCreate() training = spark.createDataFrame([ \ (0, "a b c d e spark", 1.0),\ (1, "b d", 0.0),\ (2, "spark f g h", 1.0),\ (3, "hadoop mapreduce", 0.0) ], ["id", "text", "label"])

test_df = spark.createDataFrame([\ (4, "spark i j k"),\ (5, "l m n"),\ (6, "spark hadoop spark"),\ (7, "apache hadoop")], ["id", "text"])

Create an MLlib pipeline

categoricalCols = ["id","text"] stages = [] for cat_col in categoricalCols: stringIndexer = StringIndexer(inputCol = cat_col, outputCol = cat_col + 'Index') encoder = OneHotEncoderEstimator(inputCols=[stringIndexer.getOutputCol()], outputCols=[cat_col + "classVec"]) stages += [stringIndexer,encoder] HashedInputs = [c + "classVec" for c in categoricalCols]

assembler = VectorAssembler(inputCols=HashedInputs,outputCol="features")

stages += [assembler] gbt = GBTClassifier(maxBins=4,maxDepth=4,maxIter=5) stages += [gbt] pipeline = Pipeline(stages=stages) model = pipeline.fit(training)

sparkTransformed = model.transform(training) model_name_export = "gbt_pipeline.zip" model_name_path = os.getcwd() model_file = os.path.join(model_name_path, model_name_export) model_file_path = "jar:file:{}".format(model_file) model.serializeToBundle(model_file_path, sparkTransformed)

giving:

`:: loading settings :: url = jar:file:/opt/spark-2.4.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml ml.combust.mleap#mleap-spark-base_2.11 added as a dependency ml.combust.mleap#mleap-runtime_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-4f79d946-c242-4400-8619-8055519379e3;1.0 confs: [default] found ml.combust.mleap#mleap-spark-base_2.11;0.13.0 in central found ml.combust.mleap#mleap-runtime_2.11;0.13.0 in central found ml.combust.mleap#mleap-core_2.11;0.13.0 in central found ml.combust.mleap#mleap-base_2.11;0.13.0 in central found ml.combust.mleap#mleap-tensor_2.11;0.13.0 in central found io.spray#spray-json_2.11;1.3.2 in central found com.github.rwl#jtransforms;2.4.0 in central found ml.combust.bundle#bundle-ml_2.11;0.13.0 in central found com.google.protobuf#protobuf-java;3.5.1 in central found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in central found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in central found com.lihaoyi#fastparse_2.11;1.0.0 in central found com.lihaoyi#fastparse-utils_2.11;1.0.0 in central found com.lihaoyi#sourcecode_2.11;0.1.4 in central found com.jsuereth#scala-arm_2.11;2.0 in central found com.typesafe#config;1.3.0 in central found org.scala-lang#scala-reflect;2.11.8 in central found ml.combust.bundle#bundle-hdfs_2.11;0.13.0 in central downloading https://repo1.maven.org/maven2/ml/combust/mleap/mleap-spark-base_2.11/0.13.0/mleap-spark-base_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.mleap#mleap-spark-base_2.11;0.13.0!mleap-spark-base_2.11.jar (11ms) downloading https://repo1.maven.org/maven2/ml/combust/mleap/mleap-runtime_2.11/0.13.0/mleap-runtime_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.mleap#mleap-runtime_2.11;0.13.0!mleap-runtime_2.11.jar (53ms) downloading https://repo1.maven.org/maven2/ml/combust/bundle/bundle-hdfs_2.11/0.13.0/bundle-hdfs_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.bundle#bundle-hdfs_2.11;0.13.0!bundle-hdfs_2.11.jar (3ms) downloading https://repo1.maven.org/maven2/ml/combust/mleap/mleap-core_2.11/0.13.0/mleap-core_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.mleap#mleap-core_2.11;0.13.0!mleap-core_2.11.jar (38ms) downloading https://repo1.maven.org/maven2/ml/combust/bundle/bundle-ml_2.11/0.13.0/bundle-ml_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.bundle#bundle-ml_2.11;0.13.0!bundle-ml_2.11.jar (75ms) downloading https://repo1.maven.org/maven2/org/scala-lang/scala-reflect/2.11.8/scala-reflect-2.11.8.jar ... [SUCCESSFUL ] org.scala-lang#scala-reflect;2.11.8!scala-reflect.jar (153ms) downloading https://repo1.maven.org/maven2/ml/combust/mleap/mleap-base_2.11/0.13.0/mleap-base_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.mleap#mleap-base_2.11;0.13.0!mleap-base_2.11.jar (4ms) downloading https://repo1.maven.org/maven2/ml/combust/mleap/mleap-tensor_2.11/0.13.0/mleap-tensor_2.11-0.13.0.jar ... [SUCCESSFUL ] ml.combust.mleap#mleap-tensor_2.11;0.13.0!mleap-tensor_2.11.jar (4ms) downloading https://repo1.maven.org/maven2/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar ... [SUCCESSFUL ] com.github.rwl#jtransforms;2.4.0!jtransforms.jar (36ms) downloading https://repo1.maven.org/maven2/io/spray/spray-json_2.11/1.3.2/spray-json_2.11-1.3.2.jar ... [SUCCESSFUL ] io.spray#spray-json_2.11;1.3.2!spray-json_2.11.jar(bundle) (11ms) downloading https://repo1.maven.org/maven2/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar ... [SUCCESSFUL ] com.google.protobuf#protobuf-java;3.5.1!protobuf-java.jar(bundle) (41ms) downloading https://repo1.maven.org/maven2/com/thesamet/scalapb/scalapb-runtime_2.11/0.7.1/scalapb-runtime_2.11-0.7.1.jar ... [SUCCESSFUL ] com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1!scalapb-runtime_2.11.jar (108ms) downloading https://repo1.maven.org/maven2/com/jsuereth/scala-arm_2.11/2.0/scala-arm_2.11-2.0.jar ... [SUCCESSFUL ] com.jsuereth#scala-arm_2.11;2.0!scala-arm_2.11.jar (6ms) downloading https://repo1.maven.org/maven2/com/typesafe/config/1.3.0/config-1.3.0.jar ... [SUCCESSFUL ] com.typesafe#config;1.3.0!config.jar(bundle) (10ms) downloading https://repo1.maven.org/maven2/com/thesamet/scalapb/lenses_2.11/0.7.0-test2/lenses_2.11-0.7.0-test2.jar ... [SUCCESSFUL ] com.thesamet.scalapb#lenses_2.11;0.7.0-test2!lenses_2.11.jar (6ms) downloading https://repo1.maven.org/maven2/com/lihaoyi/fastparse_2.11/1.0.0/fastparse_2.11-1.0.0.jar ... [SUCCESSFUL ] com.lihaoyi#fastparse_2.11;1.0.0!fastparse_2.11.jar (12ms) downloading https://repo1.maven.org/maven2/com/lihaoyi/fastparse-utils_2.11/1.0.0/fastparse-utils_2.11-1.0.0.jar ... [SUCCESSFUL ] com.lihaoyi#fastparse-utils_2.11;1.0.0!fastparse-utils_2.11.jar (5ms) downloading https://repo1.maven.org/maven2/com/lihaoyi/sourcecode_2.11/0.1.4/sourcecode_2.11-0.1.4.jar ... [SUCCESSFUL ] com.lihaoyi#sourcecode_2.11;0.1.4!sourcecode_2.11.jar(bundle) (7ms) :: resolution report :: resolve 1719ms :: artifacts dl 598ms :: modules in use: com.github.rwl#jtransforms;2.4.0 from central in [default] com.google.protobuf#protobuf-java;3.5.1 from central in [default] com.jsuereth#scala-arm_2.11;2.0 from central in [default] com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default] com.lihaoyi#fastparse_2.11;1.0.0 from central in [default] com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default] com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default] com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default] com.typesafe#config;1.3.0 from central in [default] io.spray#spray-json_2.11;1.3.2 from central in [default] ml.combust.bundle#bundle-hdfs_2.11;0.13.0 from central in [default] ml.combust.bundle#bundle-ml_2.11;0.13.0 from central in [default] ml.combust.mleap#mleap-base_2.11;0.13.0 from central in [default] ml.combust.mleap#mleap-core_2.11;0.13.0 from central in [default] ml.combust.mleap#mleap-runtime_2.11;0.13.0 from central in [default] ml.combust.mleap#mleap-spark-base_2.11;0.13.0 from central in [default] ml.combust.mleap#mleap-tensor_2.11;0.13.0 from central in [default] org.scala-lang#scala-reflect;2.11.8 from central in [default] :: evicted modules: com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]

|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |   19  |   18  |   18  |   1   ||   18  |   18  |
---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-4f79d946-c242-4400-8619-8055519379e3 confs: [default] 18 artifacts copied, 0 already retrieved (16182kB/35ms) 2019-01-24 14:12:06 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Traceback (most recent call last): File "test_mlflow_mleap.py", line 67, in model.serializeToBundle(model_file_path, sparkTransformed) File "/usr/local/anaconda/lib/python3.6/site-packages/mleap/pyspark/spark_support.py", line 25, in serializeToBundle serializer.serializeToBundle(self, path, dataset=dataset) File "/usr/local/anaconda/lib/python3.6/site-packages/mleap/pyspark/spark_support.py", line 42, in serializeToBundle self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf) File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/opt/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o426.serializeToBundle. : com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'ml.combust.mleap.spark' at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152) at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145) at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)`

SoloBean commented 5 years ago

@Raghavendra Singh I am failling with the same error, have you solved it?

yairdata commented 5 years ago

@SoloBean , @RaghavendraSingh , @lie-yan - have you been able to overcome the error of:

java.lang.NoClassDefFoundError: resource/package$

i am using pyspark 2.3.1 , with mleap 0.8.1 installed from pypi if the jar-with-dependencies build helped you - can you share it's location or the steps taken to produce it from the project ?

yairdata commented 5 years ago

i think that this issue is relevant to https://github.com/combust/mleap/issues/257 , with automatic resource configuration . there are 2 open issues related: https://github.com/combust/mleap-docs/issues/8 https://github.com/combust/mleap/issues/343

: java.lang.NoClassDefFoundError: resource/package$ at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)

i am using mleap 0.8.1 , it gets the scala-arm_2.11;2.0 version , java 1.8.0_192 - but it seems that something is still wrong there with the dependencies. @hollinwilkins - please advise.

enpinzolas commented 5 years ago

UPDATE:

It turns out that it was a dependency problem. I had to go to the maven repo and hunt down all the compilation dependencies and include them in my jars folder.

Old response

I am also experiencing this issue:

py4j.protocol.Py4JJavaError: An error occurred while calling o119.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
...

I believe that I have all the required jars. I had to download these:

mleap-base_2.11-0.13.0.jar    
mleap-core_2.11-0.13.0.jar  
mleap-runtime_2.11-0.13.0.jar  
mleap-spark_2.11-0.13.0.jar          
mleap-spark-base_2.11-0.13.0.jar     
mleap-tensor_2.11-0.13.0.jar
bundle-hdfs_2.11-0.13.0.jar           
bundle-ml_2.11-0.13.0.jar             
scalapb-runtime_2.11-0.9.0-RC1.jar
lenses_2.11-0.9.0-RC1.jar
config-1.3.4.jar

archetana commented 5 years ago

Added all the dependencies mentioned by @enpinzolas , tried with Spark version 2.3.0, 2.3.3 and 2.4.3. Getting the following error

py4j.protocol.Py4JJavaError: An error occurred while calling o97.serializeToBundle.
: java.lang.NoClassDefFoundError: resource.package$
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)

Got it working with Spark 2.3 and 2.4 versions , added the following dependencies, besides the ones listed @enpinzolas scala-arm_2.11-2.0 spray-json_2.11-1.3.5 protobuf-java-3.8.0

guzzijones commented 4 years ago

For those of you using amazon emr pyspark: https://github.com/combust/mleap-docs/issues/23

ancasarb commented 3 years ago

Closing this issue as an effort to clean up some older issues, please re-open if there are still unanswered questions, thank you!

combust / mleap

Error using MLeap with PySpark #343

Create an MLlib pipeline

UPDATE:

Old response