Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Schema drift: The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType. #86

Closed rgk85 closed 2 years ago

rgk85 commented 2 years ago

I have encountered an issue where the schema includes more columns than my actual data, the reading throws an error also saying this.

The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType. Either modify the attributes in manifest to make it equal to the number of columns in CSV/parquet files or modify the csv/parquet file

image

I was reading the documentation and unsupported scenarios and as far as I understood on the scenario where the actual data has more columns than specified in the schema is not supported, am I missing something in the documentation or perhaps I'm doing something wrong, perhaps a workaround is in place?

spark-cdm-connector 0.19.1 Databricks 6.4 Spark 2.4.5

Nuglar commented 2 years ago

This may be relatede to the issue we were facing, discussed in a previous issue: https://github.com/Azure/spark-cdm-connector/issues/84

Things to try: