Closed omnoy closed 2 months ago
Hi,
Could you please add the maven package or the Fat JAR (from our release notes) in the SparkSession explicitly? https://sparknlp.org/docs/en/install#start-spark-nlp-session-from-python
spark = SparkSession.builder \
.appName("Spark NLP") \
.master("local[*]") \
.config("spark.driver.memory", "16G") \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.config("spark.kryoserializer.buffer.max", "2000M") \
.config("spark.driver.maxResultSize", "0") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.2") \
.getOrCreate()
Here is an example: https://colab.research.google.com/drive/1lFozDD16iRQg5z3wQKUi0-7oUKfPOw6H?usp=sharing
Added it and the same error pops up. Running the example ipynb locally also seems to have the same issue:
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/omer/.local/lib/python3.10/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/omer/.local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/home/omer/.local/lib/python3.10/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[2], [line 8](vscode-notebook-cell:?execution_count=2&line=8)
[5](vscode-notebook-cell:?execution_count=2&line=5) from sparknlp.annotator import UniversalSentenceEncoder, SentimentDLModel
[6](vscode-notebook-cell:?execution_count=2&line=6) from pyspark.ml import Pipeline
----> [8](vscode-notebook-cell:?execution_count=2&line=8) document_assembler = DocumentAssembler() \
[9](vscode-notebook-cell:?execution_count=2&line=9) .setInputCol("text") \
[10](vscode-notebook-cell:?execution_count=2&line=10) .setOutputCol("document")
[12](vscode-notebook-cell:?execution_count=2&line=12) use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
[13](vscode-notebook-cell:?execution_count=2&line=13) .setInputCols(["document"])\
[14](vscode-notebook-cell:?execution_count=2&line=14) .setOutputCol("sentence_embeddings")
[16](vscode-notebook-cell:?execution_count=2&line=16) classifier = SentimentDLModel().pretrained('sentimentdl_use_twitter')\
[17](vscode-notebook-cell:?execution_count=2&line=17) .setInputCols(["sentence_embeddings"])\
[18](vscode-notebook-cell:?execution_count=2&line=18) .setOutputCol("sentiment")
File ~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:139, in keyword_only.<locals>.wrapper(self, *args, **kwargs)
[137](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:137) raise TypeError("Method %s forces keyword arguments." % func.__name__)
[138](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:138) self._input_kwargs = kwargs
--> [139](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:139) return func(self, **kwargs)
File ~/.local/lib/python3.10/site-packages/sparknlp/base/document_assembler.py:96, in DocumentAssembler.__init__(self)
[94](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/base/document_assembler.py:94) @keyword_only
[95](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/base/document_assembler.py:95) def __init__(self):
---> [96](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/base/document_assembler.py:96) super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler")
[97](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/base/document_assembler.py:97) self._setDefault(outputCol="document", cleanupMode='disabled')
File ~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:139, in keyword_only.<locals>.wrapper(self, *args, **kwargs)
[137](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:137) raise TypeError("Method %s forces keyword arguments." % func.__name__)
[138](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:138) self._input_kwargs = kwargs
--> [139](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/__init__.py:139) return func(self, **kwargs)
File ~/.local/lib/python3.10/site-packages/sparknlp/internal/annotator_transformer.py:36, in AnnotatorTransformer.__init__(self, classname)
[34](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/internal/annotator_transformer.py:34) self.setParams(**kwargs)
[35](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/internal/annotator_transformer.py:35) self.__class__._java_class_name = classname
---> [36](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/sparknlp/internal/annotator_transformer.py:36) self._java_obj = self._new_java_obj(classname, self.uid)
File ~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:84, in JavaWrapper._new_java_obj(java_class, *args)
[82](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:82) java_obj = _jvm()
[83](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:83) for name in java_class.split("."):
---> [84](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:84) java_obj = getattr(java_obj, name)
[85](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:85) java_args = [_py2java(sc, arg) for arg in args]
[86](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/anaconda3/envs/projectEnv/lib/python3.10/site-packages/pyspark/ml/wrapper.py:86) return java_obj(*java_args)
File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1664, in JavaPackage.__getattr__(self, name)
[1661](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1661) return JavaClass(
[1662](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1662) answer[proto.CLASS_FQN_START:], self._gateway_client)
[1663](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1663) else:
-> [1664](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/omer/bigdata/FinalProject/~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1664) raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
Py4JError: com.johnsnowlabs.nlp.DocumentAssembler does not exist in the JVM
as you can see there is no issue with the library, running Apache Spark alone on Windows is just very tricky. I recommend first making sure everything works with PySpark alone. Your Env seems not setup correctly for Spark.
Is there an existing issue for this?
Who can help?
@maziyarpanahi
What are you working on?
Been struggling with running a simple demo of the SparkNLP Sentiment Analysis, using a SparkSession to initialize since I am planning to run an integration between kafka and SparkNLP.
Current Behavior
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40) at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51) at py4j.reflection.TypeUtil.forName(TypeUtil.java:243) at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175) at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 10 more ERROR:root:Exception while sending command. Traceback (most recent call last): File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 516, in send_command raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1038, in send_command response = connection.send_command(command) File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 539, in send_command raise Py4JNetworkError( py4j.protocol.Py4JNetworkError: Error while sending or receiving
Expected Behavior
Demo should run without an error.
Steps To Reproduce
Running these two code blocks in a jupyter notebook cell with a conda kernel with python 3.10.14, tried this with a normal python 3.10.14 kernel as well.
Spark NLP version and Apache Spark
sparknlp.version()=5.4.2 spark.version=3.5.1
Type of Spark Application
Python Application
Java Version
openjdk version "1.8.0_422", but same results for java 11 as well
Java Home Directory
No response
Setup and installation
SparkNLP and Spark were installed both on conda and base python with pip install. SparkNLP jar was downloaded and moved to /opt/spark/jars
Operating System and Version
5.15.153.1-microsoft-standard-WSL2, Ubuntu 22.04.2 LTS
Link to your project (if available)
No response
Additional Information
No response