PacktPublishing / Databricks-ML-In-Action

Databricks ML in Action, Published by Packt
MIT License
19 stars 26 forks source link

Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect #90

Open tamaskerekjarto opened 2 weeks ago

tamaskerekjarto commented 2 weeks ago

After running the WriteJasonFile function in cell 12 of the Chapter 2: Designing Databricks Day One/Project: Streaming Transactions/CH2-01-Generating Records Using DBKS Labs Datagen.py notebook I get the following error msg: JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute_jdfis not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

After some research, it looks like the reduce function in the generateRecordSet(): block is causing the issue due to this not being supported by Spark Connect. I had to change return reduce(pyspark.sql.dataframe.DataFrame.unionByName, recordSet) to result = recordSet[0] for df in recordSet[1:]: result = result.unionByName(df) return result

github-actions[bot] commented 2 weeks ago

Thank you for submitting your first issue to our repo! Please consider creating a pull request to address it.