Closed NitinSingh12 closed 2 years ago
While running below code in databricks (Databricks Runtime Version = 10.5 (includes Apache Spark 3.2.1, Scala 2.12)) - Library which we have installed is not working with current cluster but working with 6.4 runtime version.
Error: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
d = datetime.strptime("2015-03-31", '%Y-%m-%d') ts = datetime.now() data = [ ["a", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)], ["b", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)] ]
schema = (StructType() .add(StructField("name", StringType(), True)) .add(StructField("id", IntegerType(), True)) .add(StructField("flag", BooleanType(), True)) .add(StructField("salary", DoubleType(), True)) .add(StructField("phone", LongType(), True)) .add(StructField("dob", DateType(), True)) .add(StructField("time", TimestampType(), True)) .add(StructField("decimal1", DecimalType(15, 3), True)) .add(StructField("decimal2", DecimalType(38, 7), True)) .add(StructField("decimal3", DecimalType(5, 2), True)) )
df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .option("format", "parquet") .option("compression", "gzip") .save())
(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .mode("append") .save())
readDf = (spark.read.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .load())
readDf.select("*").show()
Need your help here to direct us to right library so that we can create entity tables in databricks.
See issue #92.
While running below code in databricks (Databricks Runtime Version = 10.5 (includes Apache Spark 3.2.1, Scala 2.12)) - Library which we have installed is not working with current cluster but working with 6.4 runtime version.
Error: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Write a CDM entity with Parquet data files, entity definition is derived from the dataframe schema
d = datetime.strptime("2015-03-31", '%Y-%m-%d') ts = datetime.now() data = [ ["a", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)], ["b", 1, True, 12.34, 6, d, ts, Decimal(1.4337879), Decimal(999.00), Decimal(18.8)] ]
schema = (StructType() .add(StructField("name", StringType(), True)) .add(StructField("id", IntegerType(), True)) .add(StructField("flag", BooleanType(), True)) .add(StructField("salary", DoubleType(), True)) .add(StructField("phone", LongType(), True)) .add(StructField("dob", DateType(), True)) .add(StructField("time", TimestampType(), True)) .add(StructField("decimal1", DecimalType(15, 3), True)) .add(StructField("decimal2", DecimalType(38, 7), True)) .add(StructField("decimal3", DecimalType(5, 2), True)) )
df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
Creates the CDM manifest and adds the entity to it with gzip'd parquet partitions
with both physical and logical entity definitions
(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .option("format", "parquet") .option("compression", "gzip") .save())
Append the same dataframe content to the entity in the default CSV format
(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .mode("append") .save())
readDf = (spark.read.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/implicitTest/default.manifest.cdm.json") .option("entity", "TestEntity") .load())
readDf.select("*").show()
Need your help here to direct us to right library so that we can create entity tables in databricks.