Closed JCZuurmond closed 2 years ago
Would be awesome if we cut a release after this PR is merged. We rely on external tables + incremental strategies. Now, due to the typo, the external tables have no partitions, thus an incremental strategy that uses the partition to filter data still has to read the full table - which is a waste of resources
Actually, we should be a bit careful here. Apparently you need to run a repair command after the table is created. As the page explains, tables created from existing data will return no rows. There is a separate command to repair the table.
@JCZuurmond Aha! That seems to answer the finding by @guillesd a few weeks ago: https://github.com/dbt-labs/dbt-external-tables/pull/113#issuecomment-964210274
I'm working separately on fixing the integration tests in this package, so we can officially declare v1.0 compatibility
This PR needs more work than a typo fix. I will do some additional testing with the use of the repair table command, after that I will propose an approach
Databricks CI still fails
Oh whoops! Looks like the error is:
13:38:36 Encountered an error while running operation: Runtime Error
Runtime Error
('42000', '[42000] [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: \nDataType varchar is not supported.(line 14, pos 28)\n\n== SQL ==\n/* {"app": "dbt", "dbt_version": "1.0.0", "profile_name": "integration_ (80) (SQLExecDirectW)')
Which should be fixable by changing this from varchar
to string
:
https://github.com/dbt-labs/dbt-external-tables/blob/d17892ba756c5746847cbafd050e0ae3a252131c/integration_tests/models/plugins/spark/spark_external.yml#L38-L40
All lights are green!
Description & motivation
The partitions were not working properly. Fixes #108 .
First, there was a typo, therefor the partition clause was not added.
After doing some testing we learned that the partitions should be recovered after creating the table. This is also needed for a non-full-refresh, because new partitions are only detected after running the
RECOVER PARTITIONS
command.Checklist