databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
500 stars 226 forks source link

Support xs:any in XSD as a 'wildcard' schema element in XML parser #493

Closed srowen closed 3 years ago

srowen commented 3 years ago

See https://github.com/databricks/spark-xml/issues/480

This attempts to support a special 'wildcard' schema element, called by default "xs_any", that will match any unmatched XML elements in a node, and treat them as strings containing the raw XML content. If it's a string, it will match one element. If it's an array of strings, it will match all of them.

This provides a natural interpretation of "xs:any" in XSDs then.

srowen commented 3 years ago

@HyukjinKwon not sure if you'd like to review this - not necessary. My main concern is what I commented on at https://github.com/databricks/spark-xml/issues/480#issuecomment-706338172

srowen commented 3 years ago

I'm going to add this. It won't break existing workloads, and if the functionality works oddly in some cases mentioned above, well, it's still good to have it mostly working.