databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
499 stars 226 forks source link

Azure Synapse Spark 3.3 Runtime : spark-xml fails on writing xml #666

Closed thinh-ngu closed 10 months ago

thinh-ngu commented 10 months ago

This is using the spark-xml_2.12-0.17.0.jar uploaded into the spark environment.

On Spark Version 3.3 Azure Synapse Runtime, when attempting to write any xml from a dataframe using the following command: df.write.format('xml').mode('overwrite').save('/test/xmlfile')

You will encounter this error: Caused by: java.lang.NoClassDefFoundError: com/sun/xml/txw2/output/IndentingXMLStreamWriter

This has previous worked on Spark Version 3.2 Azure Synapse Runtime.

Is there a workaround or fix for spark-xml to accommodate this dependency?

srowen commented 10 months ago

You just added the JAR, and not its transitive dependencies. It's telling you about the additional JARs/classes you need Normally you do not add dependencies by manually copying JARs. you build them into your application and express the dependency with Maven or SBT, which takes care of this. If it worked previously, ti's because that JAR was already there for some reason..

Not an issue with this library.