Issue while installing spark-nlp in EMR: java.lang.UnsatisfiedLinkError: no jnitensorflow in java.library.path

Is there an existing issue for this?

[X] I have searched the existing issues and did not find a match.

Who can help?

Anyone who has used Spark-NLP in EMR

What are you working on?

I am trying to run it on sample data, just getting started with it.

Current Behavior

Having dependency issue. I do not have a docker set up. Just needed to understand where and why is it failing?

Expected Behavior

Any specific version combination of tensorflow and spark-nlp that might help. I have used all the versions as per doc.

Steps To Reproduce

PS: I had tensorflow and spark-nlp installed in the cluster

%%configure -f { "conf": { "spark.yarn.stagingDir": "hdfs:///tmp", "spark.yarn.preserve.staging.files": "true", "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" } }

import sparknlp from sparknlp.pretrained import PretrainedPipeline from sparknlp.annotator import from sparknlp.base import

spark = sparknlp.start() pipeline = PretrainedPipeline("explain_document_dl", lang="en")

Spark NLP version and Apache Spark

spark version - 3.3.0 spark-nlp version - 5.3.3 EMR version - 6.9.0

Type of Spark Application

spark-shell

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

JohnSnowLabs / spark-nlp