Closed My-Git-User-Name closed 4 years ago
You're calling the actions on the wrong object. You're supposed to call it on the DataFrame, your object is a SparkSession
as suggested by the error message. This is user error, I'd recommend starting the tutorial from scratch, from the beginning and it should work. There's no error in the book for this, I also don't believe it to be a Spark version issue because this part hasn't changed in Spark.
I followed the book step by step as you suggested tracing back to the beginning and arrived at the same error.
scala> flightData2015.take(3)
scala> val flightData2015 = spark flightData2015: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@5980e617 scala> .read res0: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .option("inferSchema","true") res1: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .option("header","true") res2: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .csv("C:/Users/cciesl/Downloads/SparkDefinitiveGuide/Spark-The-Definitive-Guide-master/data/flight-data/csv/2015-summary.csv") res3: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
scala> flightData2015.take(3)
You're entering it in incorrectly.
it needs to be executed as a single code block, not individual code blocks.
Copy and paste the entire block, don't enter it line by line.
val flightData2015 = spark
.option("inferSchema","true")
.option("header","true")
.csv("C:/Users/cciesl/Downloads/SparkDefinitiveGuide/Spark-The-Definitive-Guide-master/data/flight-data/csv/2015-summary.csv")
Thank You. Even that version you pasted didn't work. I guess it doesn't like the line breaks. So I submitted the command as one single line (no spaces) instead and that appeared to fix it.
actually a better way is to press :paste then you are in edit mode
It works if you run flightData2015 take(3) (with a space) instead of flightData2015.take(3)
In Chapter 2 when loading the flightData2015 csv the action commands do not work at the scala prompt in Spark 2.4.4. You get messages like: :37 error: value take is not a member of org.apache.spark.sql.SparkSession :37 error: value sort is not a member of org.apache.spark.sql.SparkSession :37 error: value show is not a member of org.apache.spark.sql.SparkSession
According to some stack overflow posts it is the wrong version of Maven. However when I read the README file for the databricks download it said that downloading Maven separately was not needed because it was included in the pre-built package that I downloaded from chapter 1.
Do you have any suggestions on how to fix this? I don't see any maven directories or files in this pre-built package spark-2.4.4-bin-hadoop2.7.gz.