Azure / spark-cdm-connector

MIT License
73 stars 30 forks source link

Support Apache Spark 3.0 #57

Closed TissonMathew closed 1 year ago

TissonMathew commented 3 years ago

Spark 3.0 still not supported as of 0.18.1 / public preview.

aowens-jmt commented 2 years ago

Can u share how to read entity with azure databricks

Here's an example (I'm using scala) of how I'm reading the data...which has been exported into a ADLS Gen2 account:

var sourceEntity = spark.read.format("com.microsoft.cdm") .option("storage", s"$sourceAccount.dfs.core.windows.net") .option("manifestPath", s"$sourceZone/model.json") .option("entity", source) .option("appId", helpers.getClientId) .option("appKey", helpers.getClientSecret) .option("tenantId", helpers.getTenantId) .option("mode", "permissive") .load()

wherever you see "helpers." is really just a wrapper I created to get KeyVault referenced data. $sourceZone is the name of the container in ADLS where my data resides $source is the name of the entity I'm reading $sourceAccount is the name of the ADLS storage account

bit007 commented 2 years ago

How to write to ADLS gen2 using databricks. I am using Databricks 10.4 LTS and trying with custom jar https://github.com/Azure/spark-cdm-connector/blob/master/artifacts/spark-cdm-connector-spark3-assembly-databricks-cred-passthrough-not-working-1.19.2.jar but getting error using pyspark.

itsanurag5 commented 1 year ago

With the above jar, I am getting below error. Can someone please help AnalysisException: Manifest doesn't exist: default.manifest.cdm.json My code is : df.write.format("com.microsoft.cdm").option("storage", storageAccountName).option("manifestPath", container + "/cdm/default.manifest.cdm.json").option("entity", "Employee").option("format", "parquet").option("appId", appid).option("appKey", appkey).option("tenantId", tenantid).mode("overwrite").save()

verargulla commented 1 year ago

While I'm getting the dreaded error: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport

I've tried both 11.3 and 10.4 LTS runtimes using spark_cdm_connector_assembly_0_19_1.jar. Any clues on this? :)

kecheung commented 1 year ago

Please read the pinned issues.