JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.8k stars 708 forks source link

java.lang.NoClassDefFoundError: Could not initialize class org.tensorflow.Graph when using pretrained model #14016

Open hilmar05 opened 11 months ago

hilmar05 commented 11 months ago

Is there an existing issue for this?

Who can help?

No response

What are you working on?

I am trying to get spark-nlp to work on Databricks using an example from the documentation.

Current Behavior

sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[ / ]
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.NoClassDefFoundError: Could not initialize class org.tensorflow.Graph
    at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(TensorflowWrapper.scala:415)
    at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.unpackWithoutBundle(TensorflowWrapper.scala:330)
    at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:484)
    at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:154)
    at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:123)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readTensorflowModel(SentenceDetectorDLModel.scala:648)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:621)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph$(SentenceDetectorDLModel.scala:616)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:648)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1(SentenceDetectorDLModel.scala:625)
    at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1$adapted(SentenceDetectorDLModel.scala:625)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
    at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
    at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
    at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:518)
    at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:510)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:709)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
    at py4j.Gateway.invoke(Gateway.java:306)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
    at java.lang.Thread.run(Thread.java:750)

Expected Behavior

Code should run without any errors.

Steps To Reproduce

from sparknlp.base import DocumentAssembler
from sparknlp.annotator import SentenceDetectorDLModel, MarianTransformer
from pyspark.ml import Pipeline
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx").setInputCols("document").setOutputCol("sentence")

marian_transformer = MarianTransformer.pretrained().setInputCols("sentence").setOutputCol("translation")

pipeline = Pipeline().setStages([document_assembler,  sentence_detector, marian_transformer])

data = spark.createDataFrame([["You can use Spark NLP to translate text. " + \
                               "This example pipeline translates English to French"]]).toDF("text")

# Create a pipeline model that can be reused across multiple data frames
model = pipeline.fit(data)

# You can use the model on any data frame that has a “text” column
result = model.transform(data)

display(result.select("text", "translation.result"))

Spark NLP version and Apache Spark

Spark NLP version: 5.1.2 Spark version: 3.4.1

Databricks Runtime Version: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

Type of Spark Application

Python Application

Java Version

No response

Java Home Directory

No response

Setup and installation

I iinstalled the libraries below directly on the cluster.

spark-nlp==5.1.2 com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

hilmar05 commented 11 months ago

I managed to get it to work by changing to a ML cluster. Not sure why a normal cluster did not work.

maziyarpanahi commented 11 months ago

thanks for the update @hilmar05 If you don't mind I'll keep this open until we find out why the base runtime failed. (everything is packaged internally inside the library, so no third-party dependency is needed.) We'll investigate this further