Closed thijs-nijhuis closed 3 weeks ago
@thijs-nijhuis-shell can you email me your artifacts? I have not seen any report like this, and I use incremental models with 1.8.x daily. ben.cassell@databricks.com
@benc-db , I think I found the problem.
If I look into the dbt.log I see this query somewhere at the top:
select
table_name,
if(table_type in ('EXTERNAL', 'MANAGED', 'MANAGED_SHALLOW_CLONE'), 'table', lower(table_type)) as table_type,
lower(data_source_format) as file_format,
table_owner
from `catalog_name`.`information_schema`.`tables`
where table_schema = 'schema_name'
I assume that one is used to fetch all the existing objects all at once. But when I run that command I get no results and I think the reason is similar to this discussion and caused by the fact that our catalog was renamed at some point.
When I change the query into this, I do see all the objects:
select
table_name,
if(table_type in ('EXTERNAL', 'MANAGED', 'MANAGED_SHALLOW_CLONE'), 'table', lower(table_type)) as table_type,
lower(data_source_format) as file_format,
table_owner
from system.`information_schema`.`tables`
where table_catalog = 'catalog_name' and table_schema = 'schema_name'
Interesting. Would you mind filing a ticket through your company's databricks contact about this issue with information schema, giving the context about how it's breaking dbt? I can code a fallback like I did for MV, but it would be more performant if Databricks just fixed the information_schema behavior to work as expected.
Describe the bug
Dbt-databricks 1.8.0 doesn't seem to be able to determine if a table already exists anymore. I have noticed this with both seeds and incremental models.
Seeds We have a seed in our project called 'country'. When running
dbt build --select country
it succeeds if the country table is not created yet. But the second time you run it, it fails with this error messages:When I reinstall dbt-databricks 1.7.14, I can run the command over and over again and the incremental models do create merge statements. I saw there was a fix with rerunning seeds in 1.8.1 but that doesn't solve it for me. Also, we don't have 'persist_doc' set, nor do we have a description for this seed.
Incremental Our incremental models are always run as a 'create or replace' statement instead of a merge after upgrading tot 1.8.0. I see that locally in my target\run folder. I also see it on UC when running 'describe history catalog.schema.table_name' where all recent chnages are 'create or replace' instead of 'merge'.
Steps To Reproduce
Add a seed to the project called country.csv. Use dbt_project.yml to set its target catalog and schema (not sure if that is required). Run
dbt build --select country
or simplydbt seed
; this should work. Then run the same command again and it should fail.For the incremental models, simply create a model, set it to incremental and add this line:
{{ log("For model '"~model.name~"' is_incremental() is set to '"~is_incremental()~"'", True) }}
. The second time the model is run, 'is_incremental()' should be 'True' but it is not.Expected behavior
Automated detection if the table already exists.
Screenshots and log output
See error output above.
System information
The output of
dbt --version
:The operating system you're using: Windows 11 enterprise The output of
python --version
: Python 3.11.4Additional context
I tried to debug the dbt-databricks seeds materialization locally. This line yields 'None' for me where I would expect it to get the relation if the table exists. Weirdly enough, the used parameters (database, schema and identifier) all have the correct value. I see the same thing happening when running from a Databricks workflow. It uses a job cluster so it will get a fresh install of dbt-databricks 1.8.1 on each job run. No other packages installed.