databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
499 stars 226 forks source link

ignoreSurroundingSpaces is not working after upgrading to version 0.16.0 #636

Closed irajhedayati closed 1 year ago

irajhedayati commented 1 year ago

After upgrading to 0.16.0 from 0.14.0, my tests started failing.

Here is the output of the unit test

== Results ==
!== Correct Answer - 4 ==   == Spark Answer - 4 ==
 struct<id:string>          struct<id:string>
![A]                        [ B]
![B]                        [ D ]
![C]                        [A]
![D]                        [C ]

Here is how I read the file

    context.spark.read
      .format("com.databricks.spark.xml")
      .option("mode", "FAILFAST")
      .option("inferSchema", true)
      .option("rootTag", "feed")
      .option("rowTag", "entry")
      .option("treatEmptyValuesAsNulls", true)
      .option("ignoreNamespace", true)
      .option("ignoreSurroundingSpaces", true)
      .load("/path/to/file.xml")

and this is the input file

<?xml version="1.0" encoding="UTF-8"?>
<feed>
    <entry>
        <id>A</id>
    </entry>
    <entry>
        <id> B</id>
    </entry>
    <entry>
        <id>C </id>
    </entry>
    <entry>
        <id> D </id>
    </entry>
</feed>
srowen commented 1 year ago

Yeah, hm, I see what happened. I'll open a PR for this shortly.