Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

default.manifest.cdm.json not getting created #89

Closed wiyi123 closed 2 years ago

wiyi123 commented 2 years ago

I'm using PySpark with Apache Spark 2.4 on Azure Synapse Analytics. I'm trying to run the sample code (https://github.com/Azure/spark-cdm-connector/blob/master/samples/SparkCDMsamplePython.ipynb) provided in this repo to write a simple DataFrame to CDM format in an ADLS Gen2 path.

When I run this

(df.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", manifestPath) .option("entity", entityName) .option("format", "parquet") .mode("Overwrite") .save())

image

the parquet files get copied over to the folder location above, but the the default.manifest.cdm.json doesn't get generated like it should.

Why is this happening? I've tried going through all the documentation and can't figure it out.

srichetar commented 2 years ago

Can you check the driver logs for any errors?

wiyi123 commented 2 years ago

I figured out the issue. I had to provide the appId and appSecret as options because my Synapse notebook was using the wrong ones I set in my Spark config.