Open radianv opened 6 years ago
As an additional information, I had done the same test connecting directly to spark-master container and it work well:
`scala> val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")
textFile: org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at
scala> textFile.count res4: Long = 4385`
Probably the issue is in spark notebook configuration.
Hi @radianv,
sorry for late reply, I had a lot of issues with spark notebook and has switched to Apache Zeppelin in the end. The issue you had is most likely version mismatch of Spark between spark notebook and Spark Master.
I have the same issue! Any solution?
This is also an error inside spark-master container for
val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")
.
From the adundance of errors in the issues related to HDFS and nodes/workers it seems like something in configuration is definately missing.
It is also worth noting that the walk-through blog steps do not work: https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/
Can anyone successfuly do the following steps in this ^^^^ blog post?
I working with spark notebooks, regarding to Scalable Spark/HDFS Workbench using Docker
val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")
textFile:
org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at<console>:67
It will show the execution time and the number of lines in the csv file, but I got the next error:
cannot
assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD`I have been searching and I saw it could be about executor dependencies, any idea?