While running below code in databricks (Databricks Runtime Version = 10.5 (includes Apache Spark 3.2.1, Scala 2.12)) - Library which we have installed is not working with current cluster but working with 6.4 runtime version.

Error: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport

Write a CDM entity with Parquet data files, entity definition is derived from the dataframe schema

d = datetime.strptime("2015-03-31", '%Y-%m-%d') ts = datetime.now() data = [ ["a", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)], ["b", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)] ]

schema = (StructType() .add(StructField("name", StringType(), True)) .add(StructField("id", IntegerType(), True)) .add(StructField("flag", BooleanType(), True)) .add(StructField("salary", DoubleType(), True)) .add(StructField("phone", LongType(), True)) .add(StructField("dob", DateType(), True)) .add(StructField("time", TimestampType(), True)) .add(StructField("decimal1", DecimalType(15, 3), True)) .add(StructField("decimal2", DecimalType(38, 7), True)) .add(StructField("decimal3", DecimalType(5, 2), True)) )

df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

Creates the CDM manifest and adds the entity to it with gzip'd parquet partitions

with both physical and logical entity definitions

(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .option("format", "parquet") .option("compression", "gzip") .save())

Append the same dataframe content to the entity in the default CSV format

readDf = (spark.read.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .load())

readDf.select("*").show()

Need your help here to direct us to right library so that we can create entity tables in databricks.

Azure / spark-cdm-connector

Databricks 10.5/Spark 3.2.1: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport #94

Write a CDM entity with Parquet data files, entity definition is derived from the dataframe schema

Creates the CDM manifest and adds the entity to it with gzip'd parquet partitions

with both physical and logical entity definitions

Append the same dataframe content to the entity in the default CSV format