databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
505 stars 227 forks source link

Tags are misaligned while writing as xml using pyspark dataframes #537

Closed suganya301 closed 3 years ago

suganya301 commented 3 years ago

Tags are misaligned while writing as xml using pyspark dataframes.

options = {"rootTag": "document","rowTag": "book"} xml_jar_format = "com.databricks.spark.xml" hdfs_path = "hdfs_path" save_mode = "overwrite" dataframe.coalesce(1).write.mode(save_mode).options(**xml_options).save(hdfs_path,format=xml_format)

image image

srowen commented 3 years ago

What version are you using? that sort of looks like a problem that was fixed a while ago. I'm looking at the output of some test cases and they don't have that problem.

suganya301 commented 3 years ago

I’m using 0.6.0. Later i tried 0.12.0, seems misalignment is fixed, but Nested columns are getting created as one single column in unix.but windows side looks fine. For example: Tag - country.state.city is getting created as aa instead of

aa
srowen commented 3 years ago

OK, right. Use the latest. I'm not sure what you mean about nested columns. Could you open a separate issue with more detail?