JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.8k stars 707 forks source link

SparkNLP 2.6.2 throws "Unsupported Model: SentenceDetectorDLModel" #1080

Closed alexott closed 3 years ago

alexott commented 3 years ago

Description

I was trying the code from the sentence detector demo, but it gives the error shown below:

Expected Behavior

example works without error

Current Behavior

code throws the error:

>>> documenter = DocumentAssembler()\
...     .setInputCol("text")\
...     .setOutputCol("document")
>>>
>>> sentencerDL = SentenceDetectorDLModel\
...   .pretrained("sentence_detector_dl", "en") \
...   .setInputCols(["document"]) \
...   .setOutputCol("sentences")
...
Approximate size to download 307.2 KB
[OK!]
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/sparknlp/annotator.py", line 2953, in pretrained
    return ResourceDownloader.downloadModel(SentenceDetectorDLModel, name, lang, remote_loc)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/sparknlp/pretrained.py", line 41, in downloadModel
    j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/sparknlp/internal.py", line 176, in __init__
    super(_DownloadModel, self).__init__("com.johnsnowlabs.nlp.pretrained."+validator+".downloadModel", reader, name, language, remote_loc)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/sparknlp/internal.py", line 129, in __init__
    self._java_obj = self.new_java_obj(java_obj, *args)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/sparknlp/internal.py", line 139, in new_java_obj
    return self._new_java_obj(java_class, *args)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/pyspark/ml/wrapper.py", line 67, in _new_java_obj
    return java_obj(*java_args)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/alexey.ott/opt/anaconda3/envs/nlp1/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.RuntimeException: Unsupported Model: SentenceDetectorDLModel
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$$anonfun$10.apply(ResourceDownloader.scala:465)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$$anonfun$10.apply(ResourceDownloader.scala:465)
    at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    at scala.collection.AbstractMap.getOrElse(Map.scala:59)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:465)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Steps to Reproduce

  1. Install fresh version of spark-nlp
  2. run python & execute the code from the notebook

Context

just playing

Your Environment

maziyarpanahi commented 3 years ago

Thanks for reporting this. The model is missing from our ResourceDownloader and we missed this in our unit tests. Will fix it in the next hot-fix release.

davidlenz commented 3 years ago

I would really like to change the code on my local machine before the next hot-fix release is out to experiment with the SentenceDetectorDL. However i just can't find the files you changed in the relevant pull request. Any hints where those are located?

maziyarpanahi commented 3 years ago

@davidlenz If you need to test the PR, you need to:

This will give you a FAT JAR with the fix (inside target/...). Then you can start your SparkSession with that Fat JAR:

spark = SparkSession.builder \
    .appName("Spark NLP")\
    .master("local[4]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "2G") \
    .config("spark.jars", "/spark-nlp/target/scala-2.11/spark-nlp-assembly-2.6.2.jar")\
    .config("spark.kryoserializer.buffer.max", "1000M")\
    .getOrCreate()

Then you have the fix from that PR. No need to do anything to the PyPI spark-nlp, the fix is all on the JAR side.

davidlenz commented 3 years ago

Thanks a lot for the quick reply @maziyarpanahi ! I will try to do as you advised.

maziyarpanahi commented 3 years ago

I have made a release with the fix just in case you want to test it:

pip install spark-nlp==2.6.3-rc1 and the sparknlp.start() will take care of the rest or if needed we have 2.6.3-rc1 on the Maven as well

davidlenz commented 3 years ago

Thanks a lot! Works great for me. Really happy with the splitting results from the multilingual model so far.

maziyarpanahi commented 3 years ago

Thanks a lot! Works great for me. Really happy with the splitting results from the multilingual model so far.

Thank you for your feedback, appreciate it 😃