databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
499 stars 226 forks source link

Failed to find data source: xml. #661

Closed luisenriqueramos1977 closed 11 months ago

luisenriqueramos1977 commented 12 months ago

In my databrick project, i installed the java file spark_xml_2_12_0_13_0.jar and when running the command:

df1=spark.read.format('xml').option('rowTag',tag_to_extract).load(input_xml_file)

I get the error: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: xml. Please find packages at https://spark.apache.org/third-party-projects.html.

runtime: 11.3

Any recommendation?

Luis Ramos

srowen commented 12 months ago

I am not sure what that JAR is, but chances are it's wrong. You don't install libraries this way in Databricks. Install a "Maven" library and use the coordinates com.databricks:spark-xml_2.12:0.16.0

luisenriqueramos1977 commented 12 months ago

Thanks for your answer.

I instalado the librará you mentioned, and had same issue.

Luis Ramos.

El mié, 6 sept 2023 16:02, Sean Owen @.***> escribió:

I am not sure what that JAR is, but chances are it's wrong. You don't install libraries this way in Databricks. Install a "Maven" library and use the coordinates com.databricks:spark-xml_2.12:0.16.0

— Reply to this email directly, view it on GitHub https://github.com/databricks/spark-xml/issues/661#issuecomment-1708437039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7GAAKJZ7XM5WH2XW3WC3XZB7ARANCNFSM6AAAAAA4NMN45Y . You are receiving this because you authored the thread.Message ID: @.***>

srowen commented 12 months ago

It works fine for me on Databricks, I'm not sure what might be different. Check that the library has actually installed in the cluster UI.