databricks / Spark-The-Definitive-Guide

Spark: The Definitive Guide's Code Repository
http://shop.oreilly.com/product/0636920034957.do
Other
2.85k stars 2.76k forks source link

not a member of org.apache.spark.sql.SparkSession #49

Closed My-Git-User-Name closed 4 years ago

My-Git-User-Name commented 4 years ago

In Chapter 2 when loading the flightData2015 csv the action commands do not work at the scala prompt in Spark 2.4.4. You get messages like: :37 error: value take is not a member of org.apache.spark.sql.SparkSession :37 error: value sort is not a member of org.apache.spark.sql.SparkSession :37 error: value show is not a member of org.apache.spark.sql.SparkSession

According to some stack overflow posts it is the wrong version of Maven. However when I read the README file for the databricks download it said that downloading Maven separately was not needed because it was included in the pre-built package that I downloaded from chapter 1.

Do you have any suggestions on how to fix this? I don't see any maven directories or files in this pre-built package spark-2.4.4-bin-hadoop2.7.gz.

bllchmbrs commented 4 years ago

You're calling the actions on the wrong object. You're supposed to call it on the DataFrame, your object is a SparkSession as suggested by the error message. This is user error, I'd recommend starting the tutorial from scratch, from the beginning and it should work. There's no error in the book for this, I also don't believe it to be a Spark version issue because this part hasn't changed in Spark.

My-Git-User-Name commented 4 years ago

I followed the book step by step as you suggested tracing back to the beginning and arrived at the same error.

scala> flightData2015.take(3)

:26: error: value take is not a member of org.apache.spark.sql.SparkSession flightData2015.take(3) ^ I think the books assumes you are on a Linux OS and I am on Windows 10 so my path where I saved the data files is different than the book's path, but other than that I followed all the same steps in the book and still arrived at the same error message.
My-Git-User-Name commented 4 years ago

scala> val flightData2015 = spark flightData2015: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@5980e617 scala> .read res0: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .option("inferSchema","true") res1: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .option("header","true") res2: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@bcdca9e5 scala> .csv("C:/Users/cciesl/Downloads/SparkDefinitiveGuide/Spark-The-Definitive-Guide-master/data/flight-data/csv/2015-summary.csv") res3: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field]

scala> flightData2015.take(3)

:26: error: value take is not a member of org.apache.spark.sql.SparkSession flightData2015.take(3) ^
bllchmbrs commented 4 years ago

You're entering it in incorrectly.

it needs to be executed as a single code block, not individual code blocks.

Copy and paste the entire block, don't enter it line by line.

val flightData2015 = spark
 .option("inferSchema","true")
.option("header","true")
 .csv("C:/Users/cciesl/Downloads/SparkDefinitiveGuide/Spark-The-Definitive-Guide-master/data/flight-data/csv/2015-summary.csv")
My-Git-User-Name commented 4 years ago

Thank You. Even that version you pasted didn't work. I guess it doesn't like the line breaks. So I submitted the command as one single line (no spaces) instead and that appeared to fix it.

zahidr commented 4 years ago

actually a better way is to press :paste then you are in edit mode

tiburcius commented 2 years ago

It works if you run flightData2015 take(3) (with a space) instead of flightData2015.take(3)