Closed KMontano18 closed 3 years ago
@KMontano18 The schema defined in the book is not what you have in the above code.
Check page 53:
Please use this schema from page 53:
val schema = StructType(Array(StructField("Id", IntegerType, false),
StructField("First", StringType, false),
StructField("Last", StringType, false),
StructField("Url", StringType, false),
StructField("Published", StringType, false),
StructField("Hits", IntegerType, false),
StructField("Campaigns", ArrayType(StringType), false)))```
Unable to get Scala code to read blogs.json via the Schema definition provided in the book.
[Code]
`package main.scala.chapter3
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.apache.spark.sql.functions.{col, expr}
object Example3_7 { def main(args: Array[String]) {
} }`
[End Code]
LongType and IntegerType were both used for Hits and Id, but both have generated the following error on my machine
scala> kmontano18@DESKTOP-PRKRT1A:~$ spark-shell -i scalaSchema.scala blogs.json blogs.json:1: error: identifier expected but integer literal found. {"Id":1, "First": "Jules", "Last":"Damji", "Url":"https://tinyurl.1", "Published":"1/4/2016", "Hits": 4535, "Campaigns": ["twitter", "LinkedIn"]}
However, when reading the json via Spark's implicit read, it generated the expected schema, albeit in a different order
scala> val df = spark.read.json("blogs.json") df: org.apache.spark.sql.DataFrame = [Campaigns: array, First: string ... 5 more fields]
scala> df.printSchema() root |-- Campaigns: array (nullable = true) | |-- element: string (containsNull = true) |-- First: string (nullable = true) |-- Hits: long (nullable = true) |-- Id: long (nullable = true) |-- Last: string (nullable = true) |-- Published: string (nullable = true) |-- Url: string (nullable = true)