The Automated Data Analytics on AWS solution provides an end-to-end data platform for ingesting, transforming, managing and querying datasets. This helps analysts and business users manage and gain insights from data without deep technical experience using Amazon Web Services (AWS).
Apache License 2.0
89
stars
27
forks
source link
Query new create data product ended up with error "HIVE_INVALID_METADATA: Hive metadata for table xxx is invalid: Table descriptor contains duplicate columns" #41
When query on a data product that was created around or after April 2023 or ingested data in this time frame, user will receive error message as below and the query fails.
HIVE_INVALID_METADATA: Hive metadata for table xxx is invalid: Table descriptor contains duplicate columns
This happens on both existing deployment and new deployment for release 1.1.0 and below.
CAUSE
AWS Glue Crawler has introduced some behaviour change in around April 2023 that the crawler will automatically creates Partition Index after it crawled the data and create a table in Glue DataCatalog. This resulted in the failure in this solution on a step that suppose to update the partition fields after the table is created by crawler. Therefore it rendered the data table invalid for Athena to query on.
SOLUTION
This issue has been resolved in Release v1.2.0. Upgrading existing deployment to Release v1.2.0 will solve the issue for new created data product. For existing data product, it might need to be removed and data re-imported after upgrading to Release v1.2.0.
SYMPTOM
When query on a data product that was created around or after April 2023 or ingested data in this time frame, user will receive error message as below and the query fails.
HIVE_INVALID_METADATA: Hive metadata for table xxx is invalid: Table descriptor contains duplicate columns
This happens on both existing deployment and new deployment for release 1.1.0 and below.
CAUSE
AWS Glue Crawler has introduced some behaviour change in around April 2023 that the crawler will automatically creates Partition Index after it crawled the data and create a table in Glue DataCatalog. This resulted in the failure in this solution on a step that suppose to update the partition fields after the table is created by crawler. Therefore it rendered the data table invalid for Athena to query on.
SOLUTION
This issue has been resolved in Release v1.2.0. Upgrading existing deployment to Release v1.2.0 will solve the issue for new created data product. For existing data product, it might need to be removed and data re-imported after upgrading to Release v1.2.0.