JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.76k stars 703 forks source link

Issue while installing spark-nlp in EMR: java.lang.UnsatisfiedLinkError: no jnitensorflow in java.library.path #14310

Open slice-niharika opened 1 month ago

slice-niharika commented 1 month ago

Is there an existing issue for this?

Who can help?

Anyone who has used Spark-NLP in EMR

What are you working on?

I am trying to run it on sample data, just getting started with it.

Current Behavior

Having dependency issue. I do not have a docker set up. Just needed to understand where and why is it failing?

Expected Behavior

Any specific version combination of tensorflow and spark-nlp that might help. I have used all the versions as per doc.

Steps To Reproduce

PS: I had tensorflow and spark-nlp installed in the cluster

%%configure -f { "conf": { "spark.yarn.stagingDir": "hdfs:///tmp", "spark.yarn.preserve.staging.files": "true", "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" } }

import sparknlp from sparknlp.pretrained import PretrainedPipeline from sparknlp.annotator import from sparknlp.base import

spark = sparknlp.start() pipeline = PretrainedPipeline("explain_document_dl", lang="en")

Spark NLP version and Apache Spark

spark version - 3.3.0 spark-nlp version - 5.3.3 EMR version - 6.9.0

Type of Spark Application

spark-shell

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response