Closed bigheadming closed 6 years ago
Hi @bigheadming
You do have this jar here and with this name right?
.config("spark.driver.extraClassPath", **"lib/sparknlp.jar"**)
I have the feeling you took the example as it is, without changing the path to the appropriate jar
I just installed spark-nlp for python and i get the same error... how do i fix this? i tried @saifjsl solution but didnt worked in my case...
any other suggestions?
thx for help
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-10341c54d325> in <module>()
1 ### Define the dataframe
----> 2 document_assembler = DocumentAssembler() .setInputCol("text")
3
4 ### Transform input to appropriate schema
5 #assembled = document_assembler.transform(data)
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs)
103 raise TypeError("Method %s forces keyword arguments." % func.__name__)
104 self._input_kwargs = kwargs
--> 105 return func(self, **kwargs)
106 return wrapper
107
C:\ProgramData\Anaconda3\lib\site-packages\sparknlp\base.py in __init__(self)
169 @keyword_only
170 def __init__(self):
--> 171 super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler")
172 self._setDefault(outputCol="document")
173
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs)
103 raise TypeError("Method %s forces keyword arguments." % func.__name__)
104 self._input_kwargs = kwargs
--> 105 return func(self, **kwargs)
106 return wrapper
107
C:\ProgramData\Anaconda3\lib\site-packages\sparknlp\base.py in __init__(self, classname)
18 self.setParams(**kwargs)
19 self.__class__._java_class_name = classname
---> 20 self._java_obj = self._new_java_obj(classname, self.uid)
21
22
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args)
61 java_obj = getattr(java_obj, name)
62 java_args = [_py2java(sc, arg) for arg in args]
---> 63 return java_obj(*java_args)
64
65 @staticmethod
TypeError: 'JavaPackage' object is not callable
@snape6666 it is not a problem in the library. The solution will depend on each Spark user, depending on their own environment.
Nowadays, Spark is widely spread used in different environments (Hadoop clusters with Yarn, Standalone with S3, Azure, Amazon EMR, etc..).
It is up to the user to understand his environment, in order to know how to make Spark executors be able to reach out JARS on a classpath. We would never be capable of covering all possible use cases since this would be impossible.
The issue here, is just a matter of Spark executors not finding the JAR in the appropriate classpath. It would happen with any JVM library you use. Usually, if creating SparkSession in python, adding spark.driver.extraClassPath and spark.executor.extraClassPath is enough. Although there are many other ways to do it (e.g. environment variables for Spark, such as SPARK_JARS or SPARK_DAEMON_CLASSPATH).
The issue on this use case, clearly shows the user is using the example's default path. Probably a result of copy-paste of the SparkSession builder example in the Readme
Thx for reply. I know that the system searches the JAR file. But which one?^^ I added the following without changes...
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars C:/MYPATH/lib/spark-nlp-assembly-1.6.0.jar pyspark-shell'
i exchanged MYPATH of course ;)
Or does the DocumentAssembler() searches the spark...jar? Iam new in using spark so sorry for my stupid questions :">
At the moment i just installed spark and spark-nlp on anaconda 3 and using jupyter notebook to execute.
ok can be closed. found this nice tutorial =)
Hey Guys, iam sorry to ask again but iam getting crazy with this..^^
The following works fine:
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
but when i try the following afterwards it crashes with the error: "TypeError: 'JavaPackage' object is not callable"
from sparknlp.base import DocumentAssembler
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
i tried to add the path like this but this didnt help:
SPARK_NLP = "MYPATH/spark-nlp-assembly-1.6.0.jar" # FAT-JAR
spark.conf.set("spark.driver.extraClassPath", SPARK_NLP)
spark.conf.set("spark.executor.extraClassPath", SPARK_NLP)
I also tried this:
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.6.0"
No chances...
same with following in the notebook:
import os
os.environ['PYTHONPATH'] = "~/.ivy2/jars/JohnSnowLabs_spark-nlp-1.6.0.jar:$PYTHONPATH"
os.environ['PYSPARK_DRIVER_PYTHON'] = "jupyter"
os.environ['PYSPARK_DRIVER_PYTHON_OPTS'] = "notebook"
os.environ['SPARK_SUBMIT_OPTIONS'] = "--packages JohnSnowLabs:spark-nlp:1.6.0"
And in Spark Shell i have the following:
spark://MYPCNAME/jars/JohnSnowLabs_spark-nlp-1.6.0.jar | Added By User
What iam doing wrong? Iam really exhausted...^^
Thx in advance and sorry for my english!
Found the solution!
For all with the same problem....
Iam using the prebuild Version of Spark with hadoop. So the easiest way to get sparknlp running is to copy the FAT-JAR of Spark_NLP directly into the jars of the spar-2.x.x-bin-hadoop.2.7/jars folder, so spark can see it. The other way didnt worked for me...^^
Hope i can help some other users with this post =)
So now its working and thx for your previous help!
Thanks @snape6666!
Thanks from me as well, @snape6666 !
Of course, you can download the provided JAR or Fat JAR and make it available to your Spark in many ways. However, if you are using pip
or conda
to install spark-nlp
then when it comes to upgrading it you also have to manually come back and download the right JAR again!
The solution is to find sparknlp.jar
path from your packages. For instance, site-packages/sparknlp/lib/sparknlp.jar
is where mine is. If you have an environment, then you have to look inside your env
directory. This way, every time you --upgrade, you won't be needing to download any external JAR since it comes with the Python package.
Example of sparknlp.jar
with the environment called spark
: envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar
Python environment
can be different places such as Anaconda3
or in your home directory. Once you found it, then there is no need to keep downloading JAR and moving it to Spark directory. You just follow this Spark session code:
spark = SparkSession.builder \
.appName("SentimentDetector")\
.master("local[*]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G")\
.config("spark.jars", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
.config("spark.driver.extraClassPath", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
.config("spark.executor.extraClassPath", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
I think there's a typo on @maziyarpanahi answer. The config is spark.jars
(plural)
Oh sorry! I'll edit it out since the other two are enough. I checked and there is no need for spark.jars
to be set. Thanks @saifjsl 👍
Now you can just use Spark Packages instead of locating and pointing to sparknlp.jar:
spark = SparkSession.builder \
.appName("ner")\
.master("local[4]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.4")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
Only config spark.jars
works for me , no need others
SparkSession.builder.config("spark.jars", "hdfs://somepath/sparknlp.jar")
If you are using jupyter notebook, make sure restart kernel before set config . I got some problem by cache .
Please join Slack or create a new issue if you are experiencing this problem.
In short, this error means the JAVA/JAR is not loaded inside the Python/PyPI.
This should correctly download and load the JAR from Maven:
import sparknlp
spark = sparknlp.start()
The start() function is equivalent of the following:
spark = SparkSession.builder \
.appName("ner")\
.master("local[4]")\
.config("spark.driver.memory","16G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2")\
.config("spark.kryoserializer.buffer.max", "1000m")\
.getOrCreate()
Hi guys, Ive been getting this same "TypeError: 'JavaPackage' object is not callable " errortrying to call Document assembler.
Python Version: 3.7.6 Environment: Anaconda3 (jupyter notebooks) OS: win10
I've tried all of the install methods in the https://nlp.johnsnowlabs.com/docs/en/install 'cheat sheet' as well as manually adding the jar to the pyspark jars folder in my anaconda installation.
I've also tried running my SparkSession with sparknlp.start()
and adding .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.3")
to the regular SparkSession.builder
call.
thanks :)
Hi @gordenstein72
Sometimes when you add another technology on top of your stack (Jupyter) it's hard to know the real error. So let's try to use a pure pyspark
PyPI package
$ java -version
# should be Java 8 (Oracle or OpenJDK)
$ conda create -n sparknlp python=3.6 -y
$ conda activate sparknlp
$ pip install spark-nlp==2.5.3 pyspark==2.4.4
# do not close this terminal nor deactivate this env
# right here please write `python` so you can go to Python console
# now run the following commands to see what can be the real issue
import sparknlp
# make sure the next command doesn't have any error or failed downloads
# this is where the JAR and the dependencies are being downloaded
# so if you are behind the firewall, or proxy or lose internet connectivity it won't load it
# and you see that error
spark = sparknlp.start()
spark.version
sparknlp.version()
That's all you need to use Spark NLP in PySpark. If you need more tests in the same Python console, you can follow these:
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.embeddings import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp
# Start Spark Session with Spark NLP
spark = sparknlp.start()
# Download a pre-trained pipeline
pipeline = PretrainedPipeline(‘explain_document_dl’, lang=’en’)
# Your testing dataset
text = “””
The Mona Lisa is a 16th-century oil painting created by Leonardo.
It’s held at the Louvre in Paris.
“””
# Annotate your testing dataset
result = pipeline.annotate(text)
# What’s in the pipeline
list(result.keys())
# Check the results
Result[‘entities’]
If you experience any issue, please provide detail steps of how you installed it, your environment, the full code and the full error so we can reproduce it.
This issue appears again in Databricks on spark 3.0.1 and spark-nlp 2.6.0 (7.4 ML cluster). Cloud you advise how to solve it there?
@msteller-Ai we don't support Spark 3.x, you can find the supported versions and Databricks runtimes:
https://github.com/JohnSnowLabs/spark-nlp#apache-spark-support
Hi @maziyarpanahi , Sorry to jump in. Any idea when you plan to support "spark-3.0.1-bin-hadoop3.2"?
Thanks
No worries. There will be the first RC1 on 12 of March and hopefully the final release end of March.
This is such an amazing news! Very much looking forward to test it.
From: Maziyar Panahi notifications@github.com Sent: Tuesday, February 23, 2021 10:13 PM To: JohnSnowLabs/spark-nlp spark-nlp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [JohnSnowLabs/spark-nlp] TypeError: 'JavaPackage' object is not callable (#232)
No worries. There will be the first RC1 on 12 of March and hopefully the final release end of March.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/JohnSnowLabs/spark-nlp/issues/232#issuecomment-784815821, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALZJBRCGA3PBD7XFY5V6Q3TASKI3ANCNFSM4FGQQJQA.
Thanks a million for this great news!
This issue appears again in Databricks on spark 3.0.1 and spark-nlp 2.6.0 (7.4 ML cluster). Cloud you advise how to solve it there?
Hey I see that sparknlp now supports spark 3.0. I'm getting the same error in Databricks. Any advise now?
We released a release candidate and announced it on the Slack. There is no final release available yet, when we release the final release it will be available in release notes, README, and the Maven repository as Spark NLP 3.0.0. (The old artifacts can’t just become compatible)
I know this is closed, but if you are submitting this via a Qubole scheduled job, add in
--packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.0.2
to the Spark Submit Command Line Options section
Thanks @franckjay
Just to expand for the future users:
If you are using Spark NLP on spark/psypark 2.3.x or 2.4.x:
com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.0.3
com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.0.3
If you are using Spark NLP on spark/pyspark 3.0.x or 3.1.x:
TypeError Traceback (most recent call last)
I use Java 11.0.10
!
Requirements: https://github.com/JohnSnowLabs/spark-nlp#requirements
I have issue configurating Java 8 with Pyspark, only java 11.0.10 works for me. I have uninstalled java 11 and used java 8 and spark session not working. Thank you so much for the reply.
I am sorry it's really hard to help with a minimum amount of information being provided and changed later. Please create a new issue, complete the template, provide all the required information including OS, versions, etc., provide steps to reproduce the error, and any snippet codes necessary.
Get "TypeError: 'JavaPackage' object is not callable " error whenever trying to call any annotators.
Description
Platform: Ubuntu 16.04LTS on Windows 10's Linux System (wls) Python: Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19) Pyspark: Use pip to install (ie python without explcit spark installation) spark-nlp: pip install --index-url https://test.pypi.org/simple/ spark-nlp==1.5.4
Tried running the followings, but all returned with the same "TypeError: 'JavaPackage' object is not callable " error. There seems to have a similar bug "Python annotators should be loadable on its own #91" that was closed sometime ago, but it still happened to me.
from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .config("spark.driver.extraClassPath", "lib/sparknlp.jar") \ .getOrCreate()
from sparknlp.annotator import from sparknlp.common import from sparknlp.base import *
documentAssembler = DocumentAssembler()\ .setInputCol("text")\ .setOutputCol("document")
lemmatizer = Lemmatizer() \ .setInputCols(["token"]) \ .setOutputCol("lemma") \ .setDictionary("./lemmas001.txt")
normalizer = Normalizer() \ .setInputCols(["token"]) \ .setOutputCol("normalized")
Here are the errors:
=== from documentassembler ==============================================
File "", line 1, in
documentAssembler = DocumentAssembler() .setInputCol("text") .setOutputCol("document")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/base.py", line 175, in init super(DocumentAssembler, self).init(classname="com.johnsnowlabs.nlp.DocumentAssembler")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/base.py", line 20, in init self._java_obj = self._new_java_obj(classname, self.uid)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callable
=== from lemmatizer ====================================================
Traceback (most recent call last):
File "", line 1, in
lemmatizer = Lemmatizer() .setInputCols(["token"]) .setOutputCol("lemma") .setDictionary("./lemmas001.txt")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 281, in init super(Lemmatizer, self).init(classname="com.johnsnowlabs.nlp.annotators.Lemmatizer")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 95, in init self._java_obj = self._new_java_obj(classname, self.uid)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callable
=== from normalizer ====================================================
Traceback (most recent call last):
File "", line 1, in
normalizer = Normalizer() .setInputCols(["token"]) .setOutputCol("normalized")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 198, in init super(Normalizer, self).init(classname="com.johnsnowlabs.nlp.annotators.Normalizer")
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 95, in init self._java_obj = self._new_java_obj(classname, self.uid)
File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)
TypeError: 'JavaPackage' object is not callable