Closed dhimmel closed 2 years ago
Hm. Maybe. It would make it impossible to distinguish when writing rather than reading, so XML wouldn't round trip correctly. But you're just reading. You can easily rename the cols in a one liner to remove the prefix. I'm not against it as it would have to be an explicit choice just not sure if it causes something else to break...
It would make it impossible to distinguish when writing rather than reading, so XML wouldn't round trip correctly
Yeah, but there'd still be the attributePrefix
can't be empty error upon writing, so it would be obvious that it's not roundtrippable. For my application, I'm solely interested in the getting the data out of XML!
You can easily rename the cols in a one liner to remove the prefix.
I don't think it's that easy with deeply nested structs as per these posts.
Fair point about structs and being able to control this differently on write. I'm OK with allowing empty on read. Do you have a PR? I can try it next week otherwise
Resolved in #571
Thanks @srowen for https://github.com/databricks/spark-xml/pull/571. Sorry I didn't get around to responding to your question about submitting a PR. Turns out I don't really know Scala, so it wasn't the easiest for me.
Your help is much appreciated!
Setting
.option("attributePrefix", "")
when reading XML causes the following error:This check was added in https://github.com/databricks/spark-xml/pull/170 in response to https://github.com/databricks/spark-xml/issues/153 with the following comment:
I'm not sure I understand what is bad about about having an empty string for
attributePrefix
. For example, I'm reading XML where I know that none of the attribute names collide with element names. And providing a prefix for attributes doesn't absolutely prevent collisions with elements, right?