Azure-Samples / cdm-azure-data-services-integration

Tutorials and sample code for integrating CDM folders with Azure Data Services
MIT License
70 stars 46 forks source link

Databricks/Datalake - Writing simple dataset to CDM gives: Write job aborted #17

Open AntonioSpalluto opened 4 years ago

AntonioSpalluto commented 4 years ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Code sample: `val consolidatedSessions = spark.sql("SELECT * FROM sessions limit 100")

(consolidatedSessions.write.format("com.microsoft.cdm") .option("entity", "Sessions") .option("appId", clientId) .option("appKey", secret) .option("tenantId", tenantId) .option("cdmFolder", cdmDataLakeFolder) .option("cdmModelName", cdmModelName) .save())`

After this exception, the job has created a snapshot file only without writing the model.json

Any log messages given by the failure

org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:134) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:187) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:183) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:134) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:114) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:111) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:240) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:97) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:170) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:710) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281) at lineb7a43bd5cad8496aa5f3fefa36dd780933.$read$$iw$$iw$$iw$$iw$$iw$$iw.(command-2525265643147671:10) at lineb7a43bd5cad8496aa5f3fefa36dd780933.$read$$iw$$iw$$iw$$iw$$iw.(command-2525265643147671:62) at lineb7a43bd5cad8496aa5f3fefa36dd780933.$read$$iw$$iw$$iw$$iw.(command-2525265643147671:64) at lineb7a43bd5cad8496aa5f3fefa36dd780933.$read$$iw$$iw$$iw.(command-2525265643147671:66)

Expected/desired behavior

All the data saved properly in the data-lake

OS and Version

5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)

man18786 commented 3 years ago

facing the same issue, any update on this ?