Open SidWeng opened 3 months ago
Turn out to be dependency conflict of breeze library. After remove the old version breeze library, another exception happens:
05:38:00.344 [main] ERROR org.apache.spark.broadcast.TorrentBroadcast - Store broadcast broadcast_0 fail, remove all pieces of the broadcast
java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:218)
at org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
at scala.collection.immutable.List.flatMap(List.scala:366)
at org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
at org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
at org.apache.spark.serializer.KryoSerializationStream.<init>(KryoSerializer.scala:266)
at org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:319)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:140)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:95)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:75)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1529)
at org.apache.spark.SparkContext.$anonfun$hadoopFile$1(SparkContext.scala:1145)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:806)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1137)
at org.apache.spark.SparkContext.$anonfun$textFile$1(SparkContext.scala:940)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:806)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:937)
at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:515)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:507)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedMPNetModel$$super$pretrained(MPNetEmbeddings.scala:474)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained(MPNetEmbeddings.scala:401)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained$(MPNetEmbeddings.scala:400)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:47)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:47)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedMPNetModel$$super$pretrained(MPNetEmbeddings.scala:474)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained(MPNetEmbeddings.scala:398)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained$(MPNetEmbeddings.scala:397)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
... 79 elided
Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 128 more
I guess it's related to Kryo since I set KryoSerializer
as default serializer. And it works good after I unset KryoSerializer
Please share information about where you are, which spark is this, what's the environment, and how you are installing and starting SparkSession with Spark NLP.
OS: Ubuntu 20.04 Spark: 3.32.0 Java: 1.8.0_412 Installation: put spark-nlp-assembly-5.4.1.jar under SPARK_HOME/jars start SparkSession: SPARK_HOME/bin/spark-shell --master spark://master-ip:7077
Please use --jars PATH/spark-nlp-assembly-5.4.1.jar
explicitly in your spark-shell
command and try again. It seems there is a mismatch between Spark NLP and Apache Spark versions.
If you can quickly do this in your Ubuntu terminal would be a great way to test everything:
conda create -n sparknlp python=3.8 -y
conda activate sparknlp
pip install spark-nlp==5.4.2 pyspark==3.3.1
Then in the same terminal use Python console
$ python
import sparknlp
spark = sparknlp.start()
# rest of your code
Is there an existing issue for this?
Who can help?
No response
What are you working on?
train a classifier with MPNetEmbeddings
Current Behavior
throw following exception during pipeline.fit()
Expected Behavior
should not have such exception
Steps To Reproduce
Spark NLP version and Apache Spark
Spark NLP: 5.4.1 Apache Spark: 3.3.0
Type of Spark Application
No response
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
Ubuntu 20.04
Link to your project (if available)
No response
Additional Information
No response