Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect

After running the WriteJasonFile function in cell 12 of the Chapter 2: Designing Databricks Day One/Project: Streaming Transactions/CH2-01-Generating Records Using DBKS Labs Datagen.py notebook I get the following error msg: JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute_jdfis not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

After some research, it looks like the reduce function in the generateRecordSet(): block is causing the issue due to this not being supported by Spark Connect. I had to change return reduce(pyspark.sql.dataframe.DataFrame.unionByName, recordSet) to result = recordSet[0] for df in recordSet[1:]: result = result.unionByName(df) return result

PacktPublishing / Databricks-ML-In-Action

Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect #90