Open slice-niharika opened 1 month ago
Anyone who has used Spark-NLP in EMR
I am trying to run it on sample data, just getting started with it.
Having dependency issue. I do not have a docker set up. Just needed to understand where and why is it failing?
Any specific version combination of tensorflow and spark-nlp that might help. I have used all the versions as per doc.
PS: I had tensorflow and spark-nlp installed in the cluster
%%configure -f { "conf": { "spark.yarn.stagingDir": "hdfs:///tmp", "spark.yarn.preserve.staging.files": "true", "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" } }
import sparknlp from sparknlp.pretrained import PretrainedPipeline from sparknlp.annotator import from sparknlp.base import
spark = sparknlp.start() pipeline = PretrainedPipeline("explain_document_dl", lang="en")
spark version - 3.3.0 spark-nlp version - 5.3.3 EMR version - 6.9.0
spark-shell
No response
Is there an existing issue for this?
Who can help?
Anyone who has used Spark-NLP in EMR
What are you working on?
I am trying to run it on sample data, just getting started with it.
Current Behavior
Having dependency issue. I do not have a docker set up. Just needed to understand where and why is it failing?
Expected Behavior
Any specific version combination of tensorflow and spark-nlp that might help. I have used all the versions as per doc.
Steps To Reproduce
PS: I had tensorflow and spark-nlp installed in the cluster
%%configure -f { "conf": { "spark.yarn.stagingDir": "hdfs:///tmp", "spark.yarn.preserve.staging.files": "true", "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" } }
import sparknlp from sparknlp.pretrained import PretrainedPipeline from sparknlp.annotator import from sparknlp.base import
spark = sparknlp.start() pipeline = PretrainedPipeline("explain_document_dl", lang="en")
Spark NLP version and Apache Spark
spark version - 3.3.0 spark-nlp version - 5.3.3 EMR version - 6.9.0
Type of Spark Application
spark-shell
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response