Open drkmd8 opened 7 years ago
@drkmd8 Can you give us version information for both the Python and MLeap JVM packages as well as Spark that you are using?
I used Python 3.6.1, Mleap 0.8.1 (pip install mleap) and pyspark 2.1.1+hadoop2.7. This problem was caused by python mleap library I think, because scala seems to work fine but python requires running with external jar files where mleap classes are included.
I have the same issue at the moment as well.
Yes, I have the same issue, my solution is add the jar file to the pyspark jars dir which at the python package path:site-packages/pyspark/jars/ . I add some jars such as below: mleap-base_2.11-0.10.0.jar mleap-core_2.11-0.10.0.jar mleap-runtime_2.11-0.10.0.jar mleap-spark_2.11-0.10.0.jar mleap-spark-base_2.11-0.10.0.jar mleap-tensor_2.11-0.10.0.jar
I hope it is helpful for you.
I also solved it by adding the jars manually to /usr/lib/spark/jars.
But I guess there is a better way to do it: just sudo pip install jip
Then install MLeap. Jip is supposed to take care of your Java dependencies if i understand correctly.
Hi @alexkayal & @tianhongjie, I have tried your solution, it fixed the JavaPackage
issue but then I got another one.
Py4JJavaError: An error occurred while calling o261.serializeToBundle.
: java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedEnum
I am not sure how it can happen, since my dataframe only has primitive types {int, double}
. Do you have any ideas on it?
@alexkayal @tianhongjie @Khiem-Tran , I am also getting the same error. Any idea how to resolve this issue?`
Py4JJavaError: **An error occurred while calling o414.serializeToBundle. : java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedEnum** at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.trueaccord.scalapb.GeneratedEnum at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more
I fixed the com/trueaccord
related errors by adding lenses_2.11-0.4.12.jar
@elgalu ,
Thank you so much for your prompt response. I have included the lenses_2.11-0.4.12.jar but still I m getting the same error as above. Do you have any other suggestion to resolve this issue?
Make sure is in the CLASSPATH, note Py4J has its own jars/ folder. And if you install pyspark separately it also comes with its own jars/ folder. What I do is remove all those jars directories and symlink to 1 /jars where a put together the whole set of working versions.
You can find all my working jars at: https://github.com/elgalu/jupyter-spark-117/tree/master/spark/jars
Pending: to build an sbt or pom.xml project (instead of a bunch of jars)
@elgalu ,
Thank you so much for your prompt response. I have included the lenses_2.11-0.4.12.jar but still I m getting the same error as above. Do you have any other suggestion to resolve this issue?
I am same to you, have you find an answer?
@elgalu @siyouhe666, I have been using spark 2.2.1 and the config --packages ml.combust.mleap:mleap-spark_2.11:0.11.0
. It seems work for me
Btw, I also follow this (blog)[https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb] to fix xgboost dependency because somehow my mleap-xgboost does not work properly
@elgalu @siyouhe666, I have been using spark 2.2.1 and the config
--packages ml.combust.mleap:mleap-spark_2.11:0.11.0
. It seems work for meBtw, I also follow this (blog)[https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb] to fix xgboost dependency because somehow my mleap-xgboost does not work properly
Thanks I have solved this problem through changed my spark version to 2.4.0, Btw, alough the official docs of mleap said they hadn't support for 2.4.0, but i found it works well
i am using python 3.6 , pyspark 2.3.1.
used mleap-core_2.11-0.11.0.jar
mleap-spark-base_2.11-0.13.0.jar
mleap-runtime_2.11-0.13.0.jar
tried 2 approaches to have pyspark know mleap:
git clone and use:
sys.path.append('C:\my-mleap\mleap-master\python')
and also by:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars ....'
or to pip install mleap (installs version 0.8.1)
when calling
model.serializeToBundle(model_file_path, sparkTransformed)
i get:
Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM
@yairdata I am same to you, have you find an answer? Thanks a lot.
@yairdata I solved this problem by adjust the version of MLeap. Original I used 0.13.0, now I uesd 0.11.0, but raise another problem: Py4JJavaError: An error occurred while calling o126.serializeToBundle. : java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory at org.apache.spark.ml.bundle.SparkBundleContext$.apply(SparkBundleContext.scala:37) at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext$lzycompute(SparkBundleContext.scala:31) at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext(SparkBundleContext.scala:31) at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22) at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22) at scala.Option.map(Option.scala:146) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:22) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
@SoloBean - i have solved this problem with 0.13.0 by specifying spark.jars.packages to point to ml.combust.mleap:mleap-spark-base_2.11:0.13.0,ml.combust.mleap:mleap-spark_2.11:0.13.0 . now i have other issue of missing jars , but this happens because i am behind a firewall , when i am without firewall everything is working +- as expected (i was able to export the model , but to a directory and not to a jar file as mentioned in the documentation)
@yairdata - I also solved this problem by add jars to
Py4JJavaError: An error occurred while calling o126.serializeToBundle. : java.lang.NoClassDefFoundError: resource/package$ at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: resource.package$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more
@SoloBean - i think there is some issue open for that about dependency conflict, not sure.
weird jar dependency issue:
i have com.trueaccord.scalapb:scalapb-runtime_2.11:0.6.7 in spark.jars.packages
i see that it has the GeneratedEnum class in it, but i still get the below error.
tried also to put the jar in the pyspark jars directory.
also i have put the lenses jar that is mentioned above in the classpath.
is there any other jar dependency that is hidden and not referenced as a dependency in the maven dependency tree ?
the error:
Py4JJavaError: An error occurred while calling o438.serializeToBundle. : java.lang.NoClassDefFoundError: scalapb/GeneratedEnum at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: scalapb.GeneratedEnum at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 24 more
Took some time to figure it out, and hence putting the steps to resolve it below.
1) The issue as reported by @drkmd8 is seen when the java class in question cannot be accessed. Happens when all the relevant jars are not provided on the classpath. Can be resolved by passing in the JAR's via --jars args or placing it on classpath 2) Once, the above issue is resolved, one can still hit the issue pointed out by @yairdata. This happens because the JVM is unable to initialise the class. This happens because the location being looked into to instantiate the class is messed up.
The most straightforward way to circumvent both the above issues is to invoke pyspark via the below: pyspark --packages ml.combust.mleap:mleap-spark_2.11:0.11.0 The mleap version in the above can be chosen according to the compatibility matrix - https://github.com/combust/mleap#mleapspark-version If the above download of packages fail on some particular jar fetch, that can be manually downloaded and placed in corresponding .m2 directory and the command should be re-run. All should be good then.
This issue does not seem to be an issue and can be closed by admins. But I wonder, as to the reason of not publishing newer versions of mleap to PyPy.
@SoloBean Meeting with the same resource/package
error as you, did you find the solution? I got:
py4j.protocol.Py4JJavaError: An error occurred while calling o103.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: resource.package$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
I have the same issue and add jars then: File "/Users/alan/local/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1598, in getattr py4j.protocol.Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM
My version: Python 2.7.10 pyspark-2.4.0 spark-2.4.0-bin-hadoop2.7 jar: mleap-base_2.11-0.13.0.jar mleap-core_2.11-0.1.5.jar mleap-executor_2.11-0.13.0.jar mleap-runtime_2.11-0.13.0.jar mleap-spark-base_2.11-0.13.0.jar mleap-spark-testkit_2.11-0.13.0.jar mleap-spark_2.11-0.13.0.jar mleap-tensor_2.11-0.13.0.jar
Hello @hollinwilkins.
Even i have the Same Issue and I added jars and i am also facing same issue
ERROR:root:Exception while sending command. Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command response = connection.send_command(command) File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command "Error while receiving", e, proto.ERROR_ON_RECEIVE) py4j.protocol.Py4JNetworkError: Error while receiving Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM
Python 3 pyspark-2.4.0 spark-2.4.0-bin-hadoop2.7 jar: mleap-base_2.11-0.13.0.jar mleap-core_2.11-0.1.5.jar mleap-executor_2.11-0.13.0.jar mleap-runtime_2.11-0.13.0.jar mleap-spark-base_2.11-0.13.0.jar mleap-spark-testkit_2.11-0.13.0.jar mleap-spark_2.11-0.13.0.jar mleap-tensor_2.11-0.13.0.jar
Please Help me out of this
Thanks
works with mleap 0.13.0 version i verified that using all the following jars when submitting command: spark-submit --master yarn --jars {jar list (each jar has to have the full path!)} my_python.py
com.github.rwl#jtransforms;2.4.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.jsuereth#scala-arm_2.11;2.0 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from central in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default]
com.typesafe#config;1.3.0 from central in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
ml.combust.bundle#bundle-hdfs_2.11;0.13.0 from central in [default]
ml.combust.bundle#bundle-ml_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-base_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-core_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-runtime_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-spark-base_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-spark_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-tensor_2.11;0.13.0 from central in [default]
org.scala-lang#scala-reflect;2.11.8 from central in [default]
hi @yairdata how did you manage to find out which version of the jar file is the compatible one? Anyone used mleap 0.15.0 yet?
@y-tee - alot of trial & error ...i wish it was documented somewhere...since it wasn't i pasted it here to help others.
@yairdata did you try all the versions :scream: Then i should prolly downgrade my mleap to 0.13.0, it works if i just change the github version.py to 0.13.0 instead of default (0.15.0) now since pip will give u a super old version?
@y-tee not all versions , there are compatible jar versions , but not all of them are listed as dependencies, so this is trial & error. regarding newer mleap versions - didn't try them because i am using older spark version (2.3.1) that is compatible with mleap v0.13.0
I've release the python mleap version 0.15.0 just today, fyi https://pypi.org/project/mleap/#history, please let me know if you see any issues.
My mleap is 0.15.0, and Spark is 2.4.4, I'm having this issue again.
Code:
pipeline = pipeline.fit(feature_df)
predictions = pipeline.transform(feature_df)
model_local_path = "something"
model_path = "jar:file:" + model_local_path + "/model.zip"
pipeline.serializeToBundle(model_path, predictions)
Error: Encoutnered error: 'PipelineModel' object has no attribute 'serializeToBundle'"
"Encoutnered error: An error occurred while calling o1480.serializeToBundle.
: java.lang.ExceptionInInitializerError
at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
at scala.Option.map(Option.scala:146)
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:22)
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: unsupported Spark version: 2.4.4
at org.apache.spark.ml.bundle.SparkBundleContext$.<init>(SparkBundleContext.scala:27)
at org.apache.spark.ml.bundle.SparkBundleContext$.<clinit>(SparkBundleContext.scala)
I am also have problem with mleap 0.15.0 and Spark 2.4.4.
basically running the code in https://github.com/combust/mleap-demo/blob/master/notebooks/PySpark%20-%20AirBnb.ipynb
pyspark
[I 17:34:36.883 NotebookApp] Loading IPython parallel extension
...
[W 17:34:53.685 NotebookApp] 404 GET /nbextensions/nbextensions_configurator/config_menu/main.js?v=20200212173436 (::1) 7.00ms referer=http://localhost:8888/notebooks/MLeap.ipynb
[I 17:34:54.021 NotebookApp] Kernel started: 56fef487-0dea-47ba-8ad3-8c19241c1193
[W 17:34:54.175 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20200212173436 (::1) 2.67ms referer=http://localhost:8888/notebooks/MLeap.ipynb
Ivy Default Cache set to: /Users/ggao/.ivy2/cache
The jars for the packages stored in: /Users/ggao/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark/2.4.4/libexec/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-avro_2.11 added as a dependency
ml.combust.mleap#mleap-spark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-dbeefc3f-8e12-443d-8629-8adf19670d42;1.0
confs: [default]
found org.apache.spark#spark-avro_2.11;2.4.4 in central
found org.spark-project.spark#unused;1.0.0 in local-m2-cache
found ml.combust.mleap#mleap-spark_2.11;0.15.0 in central
found ml.combust.mleap#mleap-spark-base_2.11;0.15.0 in central
found ml.combust.mleap#mleap-runtime_2.11;0.15.0 in central
found ml.combust.mleap#mleap-core_2.11;0.15.0 in central
found ml.combust.mleap#mleap-base_2.11;0.15.0 in central
found ml.combust.mleap#mleap-tensor_2.11;0.15.0 in central
found io.spray#spray-json_2.11;1.3.2 in central
found com.github.rwl#jtransforms;2.4.0 in central
found ml.combust.bundle#bundle-ml_2.11;0.15.0 in central
found com.google.protobuf#protobuf-java;3.5.1 in central
found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in local-m2-cache
found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in local-m2-cache
found com.lihaoyi#fastparse_2.11;1.0.0 in local-m2-cache
found com.lihaoyi#fastparse-utils_2.11;1.0.0 in local-m2-cache
found com.lihaoyi#sourcecode_2.11;0.1.4 in local-m2-cache
found com.jsuereth#scala-arm_2.11;2.0 in central
found com.typesafe#config;1.3.0 in local-m2-cache
found commons-io#commons-io;2.5 in local-m2-cache
found org.scala-lang#scala-reflect;2.11.8 in local-m2-cache
found ml.combust.bundle#bundle-hdfs_2.11;0.15.0 in central
:: resolution report :: resolve 547ms :: artifacts dl 16ms
:: modules in use:
com.github.rwl#jtransforms;2.4.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.jsuereth#scala-arm_2.11;2.0 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from local-m2-cache in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from local-m2-cache in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from local-m2-cache in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from local-m2-cache in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from local-m2-cache in [default]
com.typesafe#config;1.3.0 from local-m2-cache in [default]
commons-io#commons-io;2.5 from local-m2-cache in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
ml.combust.bundle#bundle-hdfs_2.11;0.15.0 from central in [default]
ml.combust.bundle#bundle-ml_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-base_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-core_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-runtime_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-spark-base_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-spark_2.11;0.15.0 from central in [default]
ml.combust.mleap#mleap-tensor_2.11;0.15.0 from central in [default]
org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
org.scala-lang#scala-reflect;2.11.8 from local-m2-cache in [default]
org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
:: evicted modules:
com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 23 | 0 | 0 | 1 || 22 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-dbeefc3f-8e12-443d-8629-8adf19670d42
confs: [default]
0 artifacts copied, 22 already retrieved (0kB/15ms)
20/02/12 17:34:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[I 17:34:59.073 NotebookApp] Adapting from protocol version 5.1 (kernel 56fef487-0dea-47ba-8ad3-8c19241c1193) to 5.3 (client).
20/02/12 17:35:36 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
20/02/12 17:36:22 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
20/02/12 17:36:23 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
20/02/12 17:36:23 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Exception in thread "Thread-4" java.lang.NoClassDefFoundError: ml/combust/bundle/serializer/SerializationFormat
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: ml.combust.bundle.serializer.SerializationFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
The error from the notebook
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
<ipython-input-18-e6e5bbbb80b2> in <module>()
----> 1 sparkPipelineLr.serializeToBundle(f"jar:file:{root_dir}/out/pyspark.lr.zip", sparkPipelineLr.transform(dataset_imputed))
2 sparkPipelineLogr.serializeToBundle(f"jar:file:{root_dir}/out/pyspark.logr.zip", dataset=sparkPipelineLogr.transform(dataset_imputed))
/usr/local/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, path, dataset)
22
23 def serializeToBundle(self, path, dataset=None):
---> 24 serializer = SimpleSparkSerializer()
25 serializer.serializeToBundle(self, path, dataset=dataset)
26
/usr/local/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in __init__(self)
37 def __init__(self):
38 super(SimpleSparkSerializer, self).__init__()
---> 39 self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
40
41 def serializeToBundle(self, transformer, path, dataset):
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __getattr__(self, name)
1596 answer[proto.CLASS_FQN_START:], self._gateway_client)
1597 else:
-> 1598 raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
1599
1600
Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM
@felixgao have you fixed this problem? I am using the same versions and facing the same problem.
I agree with others that this is a tricky dependency problem, not a problem with MLeap per se. Here is how I solved it on my MacBook:
spark-submit --packages ml.combust.mleap:mleap-spark_2.11:0.16.0 my_program.py
My PySpark version is 2.4.5 (see the MLeap Github page for what version of MLeap works with what version of Spark).
When I first ran spark-submit, I got a further error that Spark could not download some additional dependencies: these can be installed with Maven.
First, brew install maven
from the command line.
Then, use maven from the command line to download dependencies. Here are the three I needed:
mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=org.scala-lang:scala-reflect:2.11.12
mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=com.google.protobuf:protobuf-java:3.5.1
mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=com.typesafe:config:1.3.0
If you need different jars, you can find the coordinates by searching mvnrepository.com in your browser.
Hi,
I am trying to build an AWS Sagemaker model which includes Spark pipeline model for feature transformation.
When I use mleap inside my docker container for serializing the pipelinemodel I am getting similar exception.
I am not very sure how can I use all these mleap jars into my docker container?
Can anyone help me to get around this?
Same issue here running pyspark 2.4.3 and mleap 0.17.0. I tried two things:
Adding all jar files manually to the jars folder in pyspark:
And running with the spark submit:
spark-submit --packages ml.combust.mleap:mleap-spark_2.12:0.17.0 main.py
Neither method worked
got the same issue.
When running the code from the tutorial,
fittedPipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip", fittedPipeline.transform(df2))
got the following error.
> ---------------------------------------------------------------------------
> TypeError Traceback (most recent call last)
> /tmp/ipykernel_5527/4288136627.py in <module>
> ----> 1 fittedPipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip", fittedPipeline.transform(df2))
>
> ~/conda/pyspark30_p37_cpu_v2/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, path, dataset)
> 22
> 23 def serializeToBundle(self, path, dataset=None):
> ---> 24 serializer = SimpleSparkSerializer()
> 25 serializer.serializeToBundle(self, path, dataset=dataset)
> 26
>
> ~/conda/pyspark30_p37_cpu_v2/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in __init__(self)
> 37 def __init__(self):
> 38 super(SimpleSparkSerializer, self).__init__()
> ---> 39 self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
> 40
> 41 def serializeToBundle(self, transformer, path, dataset):
>
> TypeError: 'JavaPackage' object is not callable
Any suggestions pls?
also, tried to install mleap from source, followed the instructions, but got this error.
[error] (mleap-core/compile:compileIncremental) javac returned nonzero exit code [error] Total time: 117 s, completed Nov 14, 2021 11:36:52 PM
If you are using mleap 0.21.1 should serializeToBundle work? I am getting an error as below. Is the only option to go down? pyspark is 3.1.3. This is after resolving several other issues.
Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM
I make a spark context like this:
`def gen_spark_session(): return SparkSession.builder.appName("happy").config( "hive.exec.dynamic.partition", "True").config( "hive.exec.dynamic.partition.mode", "nonstrict").config( "spark.jars.packages", "ml.combust.mleap:mleap-spark_2.12:0.20.0," "ml.combust.mleap:mleap-spark-base_2.12:0.20.0" ).enableHiveSupport().getOrCreate()
spark = gen_spark_session()`
UPDATE: I was on Java 8 and apparently 0.21.1 is no good, it needs Java 11. I moved to 0.20.0 But, I still get this issue. I'm on Scala 2.12.
If running the code as above, there's an issue with featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip"). AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'.
If use the following code:
featurePipeline2 = featurePipeline.fit(df2) featurePipeline2.serializeToBundle("jar:file:/tmp/pyspark.example.zip")
There is an error with self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer(), saying "TypeError: 'JavaPackage' object is not callable"How to solve it?