Closed big-analytics closed 4 years ago
You probably didn't import com.databricks.spark.xml.schema_of_xml
?
Forgot to say that importing does not work.
It sounds like you do not have the library actually installed on your cluster / with your app at all then. How are you adding it in Databricks? (try reattaching to the cluster after you install)
I think I have it, I can read XML files.
And the library is installed as far as I see.
That's very strange. I just tried attaching the same library to a cluster and it worked, imports and all. Try ... restarting the cluster? Not sure what could be the issue.
YES! That did the trick! So possible that the cluster needs a restart after the library is installed. Thanks a lot!
It usually needs at least reattaching the notebook after a JVM library is installed, for Scala. If that doesn't work yeah restart. I didn't seem to need that though, FWIW.
I am using Azure Databricks on a single-node cluster with Spark 3.0.0, Scala 2.12 with spark-xml library installed: com.databricks:spark-xml_2.12:0.10.0
I am able to parse direct XML files, but would like to parse string xml columns from a dataframe, therefore Nested XML seemed the best solution.
I am doing this:
And getting this:
I would appreciate your help.