Closed SpanTag85 closed 1 year ago
Others have ran into the same error as you. Maybe you should search and check them first. https://github.com/Azure/spark-cdm-connector/issues/84 https://github.com/Azure/spark-cdm-connector/issues/86
A search will easily yield results. https://github.com/Azure/spark-cdm-connector/search?q=The+number+of+columns+in+CSV%2Fparquet&type=issues
Comment or reopen if you still have issues.
Did you read the pinned issues and search the error message?
Summary of issue
I have created a python script in DataBricks notebook that gets all the manifest files, reads the json and creates a list of all available tables/entities to read. the spark-cdm-connector loops through all the manifests but it gets stuck on at least one entity ("Tables/Custom/Custom.manifest.cdm.json", EntityName: "HCMPERSONDETAILS") with this exception:
The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType. Either modify the attributes in manifest to make it equal to the number of columns in CSV/parquet files or modify the csv/parquet file
I cannot find a way to get around this, I am not able to modify the manifest or the CSV file created by D365 export to datalake.
Error stack trace
Platform name
Azure Databricks
Spark version
Spark 3.3.0
CDM jar version
spark_cdm_connector_assembly_synapse_spark3_2_1_19_4.jar
What is the format of the data you are trying to read/write?
.csv