databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
505 stars 227 forks source link

Passing a schema when trying to read a nonexistent file masquerades the error #588

Closed frgomes closed 2 years ago

frgomes commented 2 years ago

Trying to read a non-existent file should always fail, but this is not what happens when you pass a schema.

import org.apache.spark.sql.catalyst.ScalaReflection

case class Person(name: String, surname: String)

val source = "/this/file/does/not/exist"
val options =  Map("rowTag" -> "people")
val schema = ScalaReflection.schemaFor[Person].dataType.asInstanceOf[StructType]

// this line below fails, which is the expected behavior:
spark.read.format("xml").options(options).load(source)

// this line below succeeds, even though the source file does not exist, which is not expected:
spark.read.format("xml").schema(schema).options(options).load(source)
srowen commented 2 years ago

Agreed, I proposed a fix above, though it's a minor hack.