Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

java.lang.Exception: CdmCorpusDefinition | Duplicate declaration for item: '/entityName/LogicalDefinition/entityName.cdm.json/entityName/hasAttributes/columnName' #80

Closed absognety closed 3 years ago

absognety commented 3 years ago

Hi When i am trying to write to storage account in CDM format with following piece of code:

(entity_df.write.format("com.microsoft.cdm")
  .option("storage", storage_account_name + '.dfs.core.windows.net')
  .option("manifestPath", storage_container_name + '/' + output_folder_name + "/" + "default.manifest.cdm.json")
  .option("entity", entity_name)
  .option("appId", app_id)
  .option("appKey", app_key)
  .option("tenantId", tenant_id)
  .option("format", output_format.lower())
  .mode("overwrite")
  .option("columnHeaders", False)
  .save())

I started getting this issue - the exception says that there is duplicate declaration of item in that entity specifying the column name - but I checked the data and checked all the values of columnName it doesn't have duplicate values. what do I understand from this?

P.S: here entityName and columnName are parameters which I kind of redacted in the interest of privacy concerns

Using Spark CDM connector 0.19.1

absognety commented 3 years ago

There are duplicate columns in my delta table after transformation before writing to cdm,I removed those which fixed this issue - thanks anyways