databricks-demos / dbdemos

Demos to implement your Databricks Lakehouse
Other
255 stars 80 forks source link

02-Data-preparation note cell "Review our raw Q&A dataset" FAILS , can't figure out workaround! #34

Open Free-Radical opened 1 year ago

Free-Radical commented 1 year ago

The following line (3) fails :

raw_gardening = spark.read.format("xml").option("rowTag", "row").load(f"{gardening_raw_path}/Posts.xml")

with error: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: xml. Please find packages athttps://spark.apache.org/third-party-projects.html.

Free-Radical commented 1 year ago

I am a databricks customer and have been tasked with developing Dolly related solutions. It seems the demo notebooks are not up to date/functional, since this is the second error encountered. Pls recheck all notebooks (especially Dolly related) to confirm they are still functional, so I can complete my LLM evaluation. Thanks

QuentinAmbard commented 1 year ago

hi @Free-Radical , they should be fonctional, I just tested it this morning before releasing the update Are you installing the demo with %pip install dbdemos and then import dbdemos dbdemos.install('llm-dolly-chatbot')

Then you need to run the .sh cell (the one just above what you shared), it'll load the data:

image

Maybe you had an error in this cell? We had a bug and the cell was throwing an error, I released a fix yesterday so that could be the reason. Let me know.

QuentinAmbard commented 1 year ago

@Free-Radical any update on that?

Free-Radical commented 1 year ago

@Free-Radical any update on that?

Hi @QuentinAmbard thank you for checking back, was working on another related task, back not will begin tonight and provide feedback. Thanks again.