JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.86k stars 711 forks source link

TypeError: 'JavaPackage' object is not callable #232

Closed bigheadming closed 6 years ago

bigheadming commented 6 years ago

Get "TypeError: 'JavaPackage' object is not callable " error whenever trying to call any annotators.

Description

Platform: Ubuntu 16.04LTS on Windows 10's Linux System (wls) Python: Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19) Pyspark: Use pip to install (ie python without explcit spark installation) spark-nlp: pip install --index-url https://test.pypi.org/simple/ spark-nlp==1.5.4

Tried running the followings, but all returned with the same "TypeError: 'JavaPackage' object is not callable " error. There seems to have a similar bug "Python annotators should be loadable on its own #91" that was closed sometime ago, but it still happened to me.

from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .config("spark.driver.extraClassPath", "lib/sparknlp.jar") \ .getOrCreate()

from sparknlp.annotator import from sparknlp.common import from sparknlp.base import *

documentAssembler = DocumentAssembler()\ .setInputCol("text")\ .setOutputCol("document")

lemmatizer = Lemmatizer() \ .setInputCols(["token"]) \ .setOutputCol("lemma") \ .setDictionary("./lemmas001.txt")

normalizer = Normalizer() \ .setInputCols(["token"]) \ .setOutputCol("normalized")

Here are the errors:

=== from documentassembler ==============================================

File "", line 1, in documentAssembler = DocumentAssembler() .setInputCol("text") .setOutputCol("document")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/base.py", line 175, in init super(DocumentAssembler, self).init(classname="com.johnsnowlabs.nlp.DocumentAssembler")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/base.py", line 20, in init self._java_obj = self._new_java_obj(classname, self.uid)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)

TypeError: 'JavaPackage' object is not callable

=== from lemmatizer ====================================================

Traceback (most recent call last):

File "", line 1, in lemmatizer = Lemmatizer() .setInputCols(["token"]) .setOutputCol("lemma") .setDictionary("./lemmas001.txt")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 281, in init super(Lemmatizer, self).init(classname="com.johnsnowlabs.nlp.annotators.Lemmatizer")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 95, in init self._java_obj = self._new_java_obj(classname, self.uid)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)

TypeError: 'JavaPackage' object is not callable

=== from normalizer ====================================================

Traceback (most recent call last):

File "", line 1, in normalizer = Normalizer() .setInputCols(["token"]) .setOutputCol("normalized")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 198, in init super(Normalizer, self).init(classname="com.johnsnowlabs.nlp.annotators.Normalizer")

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/init.py", line 105, in wrapper return func(self, **kwargs)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/sparknlp/annotator.py", line 95, in init self._java_obj = self._new_java_obj(classname, self.uid)

File "/home/quickt2/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 63, in _new_java_obj return java_obj(*java_args)

TypeError: 'JavaPackage' object is not callable

saif-ellafi commented 6 years ago

Hi @bigheadming

You do have this jar here and with this name right?

.config("spark.driver.extraClassPath", **"lib/sparknlp.jar"**)

I have the feeling you took the example as it is, without changing the path to the appropriate jar

SimonF89 commented 6 years ago

I just installed spark-nlp for python and i get the same error... how do i fix this? i tried @saifjsl solution but didnt worked in my case...

any other suggestions?

thx for help

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-10341c54d325> in <module>()
      1 ### Define the dataframe
----> 2 document_assembler = DocumentAssembler()             .setInputCol("text")
      3 
      4 ### Transform input to appropriate schema
      5 #assembled = document_assembler.transform(data)

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs)
    103             raise TypeError("Method %s forces keyword arguments." % func.__name__)
    104         self._input_kwargs = kwargs
--> 105         return func(self, **kwargs)
    106     return wrapper
    107 

C:\ProgramData\Anaconda3\lib\site-packages\sparknlp\base.py in __init__(self)
    169     @keyword_only
    170     def __init__(self):
--> 171         super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler")
    172         self._setDefault(outputCol="document")
    173 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs)
    103             raise TypeError("Method %s forces keyword arguments." % func.__name__)
    104         self._input_kwargs = kwargs
--> 105         return func(self, **kwargs)
    106     return wrapper
    107 

C:\ProgramData\Anaconda3\lib\site-packages\sparknlp\base.py in __init__(self, classname)
     18         self.setParams(**kwargs)
     19         self.__class__._java_class_name = classname
---> 20         self._java_obj = self._new_java_obj(classname, self.uid)
     21 
     22 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args)
     61             java_obj = getattr(java_obj, name)
     62         java_args = [_py2java(sc, arg) for arg in args]
---> 63         return java_obj(*java_args)
     64 
     65     @staticmethod

TypeError: 'JavaPackage' object is not callable
saif-ellafi commented 6 years ago

@snape6666 it is not a problem in the library. The solution will depend on each Spark user, depending on their own environment.

Nowadays, Spark is widely spread used in different environments (Hadoop clusters with Yarn, Standalone with S3, Azure, Amazon EMR, etc..).

It is up to the user to understand his environment, in order to know how to make Spark executors be able to reach out JARS on a classpath. We would never be capable of covering all possible use cases since this would be impossible.

The issue here, is just a matter of Spark executors not finding the JAR in the appropriate classpath. It would happen with any JVM library you use. Usually, if creating SparkSession in python, adding spark.driver.extraClassPath and spark.executor.extraClassPath is enough. Although there are many other ways to do it (e.g. environment variables for Spark, such as SPARK_JARS or SPARK_DAEMON_CLASSPATH).

The issue on this use case, clearly shows the user is using the example's default path. Probably a result of copy-paste of the SparkSession builder example in the Readme

SimonF89 commented 6 years ago

Thx for reply. I know that the system searches the JAR file. But which one?^^ I added the following without changes...

os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars C:/MYPATH/lib/spark-nlp-assembly-1.6.0.jar pyspark-shell'

i exchanged MYPATH of course ;)

Or does the DocumentAssembler() searches the spark...jar? Iam new in using spark so sorry for my stupid questions :">

At the moment i just installed spark and spark-nlp on anaconda 3 and using jupyter notebook to execute.

SimonF89 commented 6 years ago

ok can be closed. found this nice tutorial =)

https://changhsinlee.com/install-pyspark-windows-jupyter/

SimonF89 commented 6 years ago

Hey Guys, iam sorry to ask again but iam getting crazy with this..^^

The following works fine:

import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.sql('''select 'spark' as hello ''')
df.show()

but when i try the following afterwards it crashes with the error: "TypeError: 'JavaPackage' object is not callable"

from sparknlp.base import DocumentAssembler
documentAssembler = DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

i tried to add the path like this but this didnt help:

SPARK_NLP = "MYPATH/spark-nlp-assembly-1.6.0.jar" # FAT-JAR

spark.conf.set("spark.driver.extraClassPath", SPARK_NLP)
spark.conf.set("spark.executor.extraClassPath", SPARK_NLP)

I also tried this:

export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.6.0"

No chances...

same with following in the notebook:

import os
os.environ['PYTHONPATH']  = "~/.ivy2/jars/JohnSnowLabs_spark-nlp-1.6.0.jar:$PYTHONPATH"
os.environ['PYSPARK_DRIVER_PYTHON'] = "jupyter"
os.environ['PYSPARK_DRIVER_PYTHON_OPTS'] = "notebook"
os.environ['SPARK_SUBMIT_OPTIONS'] = "--packages JohnSnowLabs:spark-nlp:1.6.0"

And in Spark Shell i have the following:

spark://MYPCNAME/jars/JohnSnowLabs_spark-nlp-1.6.0.jar | Added By User

What iam doing wrong? Iam really exhausted...^^

Thx in advance and sorry for my english!

SimonF89 commented 6 years ago

Found the solution!

For all with the same problem....

Iam using the prebuild Version of Spark with hadoop. So the easiest way to get sparknlp running is to copy the FAT-JAR of Spark_NLP directly into the jars of the spar-2.x.x-bin-hadoop.2.7/jars folder, so spark can see it. The other way didnt worked for me...^^

Hope i can help some other users with this post =)

So now its working and thx for your previous help!

yg37 commented 5 years ago

Thanks @snape6666!

ecmonsen commented 5 years ago

Thanks from me as well, @snape6666 !

maziyarpanahi commented 5 years ago

Of course, you can download the provided JAR or Fat JAR and make it available to your Spark in many ways. However, if you are using pip or conda to install spark-nlp then when it comes to upgrading it you also have to manually come back and download the right JAR again!

The solution is to find sparknlp.jar path from your packages. For instance, site-packages/sparknlp/lib/sparknlp.jar is where mine is. If you have an environment, then you have to look inside your env directory. This way, every time you --upgrade, you won't be needing to download any external JAR since it comes with the Python package.

Example of sparknlp.jar with the environment called spark: envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar Python environment can be different places such as Anaconda3 or in your home directory. Once you found it, then there is no need to keep downloading JAR and moving it to Spark directory. You just follow this Spark session code:

spark = SparkSession.builder \
    .appName("SentimentDetector")\
    .master("local[*]")\
    .config("spark.driver.memory","8G")\
    .config("spark.driver.maxResultSize", "2G")\
    .config("spark.jars", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
    .config("spark.driver.extraClassPath", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
    .config("spark.executor.extraClassPath", "/Users/maziyar/anaconda3/envs/spark/lib/python3.6/site-packages/sparknlp/lib/sparknlp.jar")\
    .config("spark.kryoserializer.buffer.max", "500m")\
    .getOrCreate()
saif-ellafi commented 5 years ago

I think there's a typo on @maziyarpanahi answer. The config is spark.jars (plural)

maziyarpanahi commented 5 years ago

Oh sorry! I'll edit it out since the other two are enough. I checked and there is no need for spark.jars to be set. Thanks @saifjsl 👍

maziyarpanahi commented 5 years ago

Now you can just use Spark Packages instead of locating and pointing to sparknlp.jar:

spark = SparkSession.builder \
    .appName("ner")\
    .master("local[4]")\
    .config("spark.driver.memory","8G")\
    .config("spark.driver.maxResultSize", "2G") \
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.4")\
    .config("spark.kryoserializer.buffer.max", "500m")\
    .getOrCreate()
eromoe commented 4 years ago

Only config spark.jars works for me , no need others

SparkSession.builder.config("spark.jars", "hdfs://somepath/sparknlp.jar")

If you are using jupyter notebook, make sure restart kernel before set config . I got some problem by cache .

maziyarpanahi commented 4 years ago

Please join Slack or create a new issue if you are experiencing this problem.

In short, this error means the JAVA/JAR is not loaded inside the Python/PyPI.

This should correctly download and load the JAR from Maven:

import sparknlp

spark = sparknlp.start()

The start() function is equivalent of the following:

spark = SparkSession.builder \
    .appName("ner")\
    .master("local[4]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "2G") \
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2")\
    .config("spark.kryoserializer.buffer.max", "1000m")\
    .getOrCreate()
gordenstein72 commented 4 years ago

Hi guys, Ive been getting this same "TypeError: 'JavaPackage' object is not callable " errortrying to call Document assembler.

Python Version: 3.7.6 Environment: Anaconda3 (jupyter notebooks) OS: win10

I've tried all of the install methods in the https://nlp.johnsnowlabs.com/docs/en/install 'cheat sheet' as well as manually adding the jar to the pyspark jars folder in my anaconda installation.

I've also tried running my SparkSession with sparknlp.start() and adding .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.3") to the regular SparkSession.builder call.

thanks :)

maziyarpanahi commented 4 years ago

Hi @gordenstein72

Sometimes when you add another technology on top of your stack (Jupyter) it's hard to know the real error. So let's try to use a pure pyspark PyPI package

$ java -version
# should be Java 8 (Oracle or OpenJDK)

$ conda create -n sparknlp python=3.6 -y
$ conda activate sparknlp
$ pip install spark-nlp==2.5.3 pyspark==2.4.4

# do not close this terminal nor deactivate this env
# right here please write `python` so you can go to Python console
# now run the following commands to see what can be the real issue
import sparknlp
# make sure the next command doesn't have any error or failed downloads
# this is where the JAR and the dependencies are being downloaded
# so if you are behind the firewall, or proxy or lose internet connectivity it won't load it
# and you see that error
spark = sparknlp.start()

spark.version
sparknlp.version()

That's all you need to use Spark NLP in PySpark. If you need more tests in the same Python console, you can follow these:

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.embeddings import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp

# Start Spark Session with Spark NLP
spark = sparknlp.start()

# Download a pre-trained pipeline 
pipeline = PretrainedPipeline(‘explain_document_dl’, lang=’en’)

# Your testing dataset
text = “””
The Mona Lisa is a 16th-century oil painting created by Leonardo. 
It’s held at the Louvre in Paris.
“””

# Annotate your testing dataset
result = pipeline.annotate(text)

# What’s in the pipeline
list(result.keys())

# Check the results
Result[‘entities’]

If you experience any issue, please provide detail steps of how you installed it, your environment, the full code and the full error so we can reproduce it.

msteller-Ai commented 3 years ago

This issue appears again in Databricks on spark 3.0.1 and spark-nlp 2.6.0 (7.4 ML cluster). Cloud you advise how to solve it there?

maziyarpanahi commented 3 years ago

@msteller-Ai we don't support Spark 3.x, you can find the supported versions and Databricks runtimes:

https://github.com/JohnSnowLabs/spark-nlp#apache-spark-support

mzeidhassan commented 3 years ago

Hi @maziyarpanahi , Sorry to jump in. Any idea when you plan to support "spark-3.0.1-bin-hadoop3.2"?

Thanks

maziyarpanahi commented 3 years ago

No worries. There will be the first RC1 on 12 of March and hopefully the final release end of March.

gabrielenizzoli commented 3 years ago

This is such an amazing news! Very much looking forward to test it.


From: Maziyar Panahi notifications@github.com Sent: Tuesday, February 23, 2021 10:13 PM To: JohnSnowLabs/spark-nlp spark-nlp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [JohnSnowLabs/spark-nlp] TypeError: 'JavaPackage' object is not callable (#232)

No worries. There will be the first RC1 on 12 of March and hopefully the final release end of March.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/JohnSnowLabs/spark-nlp/issues/232#issuecomment-784815821, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALZJBRCGA3PBD7XFY5V6Q3TASKI3ANCNFSM4FGQQJQA.

mzeidhassan commented 3 years ago

Thanks a million for this great news!

bjorn-johnson commented 3 years ago

This issue appears again in Databricks on spark 3.0.1 and spark-nlp 2.6.0 (7.4 ML cluster). Cloud you advise how to solve it there?

Hey I see that sparknlp now supports spark 3.0. I'm getting the same error in Databricks. Any advise now?

maziyarpanahi commented 3 years ago

We released a release candidate and announced it on the Slack. There is no final release available yet, when we release the final release it will be available in release notes, README, and the Maven repository as Spark NLP 3.0.0. (The old artifacts can’t just become compatible)

maziyarpanahi commented 3 years ago

FYI: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/3.0.0-rc8

franckjay commented 3 years ago

I know this is closed, but if you are submitting this via a Qubole scheduled job, add in --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.0.2 to the Spark Submit Command Line Options section

maziyarpanahi commented 3 years ago

Thanks @franckjay

Just to expand for the future users:

If you are using Spark NLP on spark/psypark 2.3.x or 2.4.x:

If you are using Spark NLP on spark/pyspark 3.0.x or 3.1.x:

latifaait commented 3 years ago

I got the same error I use Java 11.0.10 and Spark NLP version: 3.0.3 Apache Spark version: 3.1.1 I am using windows, jupyter notebook please help and this is the error,

TypeError Traceback (most recent call last)

in 5 6 spark = sparknlp.start() ----> 7 documentAssembler = DocumentAssembler()\ 8 .setInputCol("text")\ 9 .setOutputCol("document") ~\anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs) 112 raise TypeError("Method %s forces keyword arguments." % func.__name__) 113 self._input_kwargs = kwargs --> 114 return func(self, **kwargs) 115 return wrapper 116 ~\anaconda3\lib\site-packages\sparknlp\base.py in __init__(self) 161 @keyword_only 162 def __init__(self): --> 163 super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler") 164 self._setDefault(outputCol="document", cleanupMode='disabled') 165 ~\anaconda3\lib\site-packages\pyspark\__init__.py in wrapper(self, *args, **kwargs) 112 raise TypeError("Method %s forces keyword arguments." % func.__name__) 113 self._input_kwargs = kwargs --> 114 return func(self, **kwargs) 115 return wrapper 116 ~\anaconda3\lib\site-packages\sparknlp\internal.py in __init__(self, classname) 85 self.setParams(**kwargs) 86 self.__class__._java_class_name = classname ---> 87 self._java_obj = self._new_java_obj(classname, self.uid) 88 89 ~\anaconda3\lib\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args) 64 java_obj = getattr(java_obj, name) 65 java_args = [_py2java(sc, arg) for arg in args] ---> 66 return java_obj(*java_args) 67 68 @staticmethod TypeError: 'JavaPackage' object is not callable
maziyarpanahi commented 3 years ago

I use Java 11.0.10!

Requirements: https://github.com/JohnSnowLabs/spark-nlp#requirements

latifaait commented 3 years ago

I have issue configurating Java 8 with Pyspark, only java 11.0.10 works for me. I have uninstalled java 11 and used java 8 and spark session not working. Thank you so much for the reply.

maziyarpanahi commented 3 years ago

I am sorry it's really hard to help with a minimum amount of information being provided and changed later. Please create a new issue, complete the template, provide all the required information including OS, versions, etc., provide steps to reproduce the error, and any snippet codes necessary.