Open Bonnevie opened 1 year ago
Thanks for opening this @Bonnevie !
I made an attempt at reproducing this, but wasn't able to trigger the same error as you. Could you tweak the simple example below to reproduce what you were seeing?
I am using a local postgres database as described here so you'll need to tweak your database, schema, etc as-needed.
models/_sources.yml
version: 2
sources:
- name: dbt_dbeatty
database: postgres
schema: dbt_dbeatty
tables:
- name: person
snapshots/my_snapshot.sql
{% snapshot my_snapshot %}
{{
config(
target_database='postgres',
target_schema='snapshots',
strategy='check',
check_cols='all',
unique_key='name'
)
}}
select * from {{ source("dbt_dbeatty", "person") }}
{% endsnapshot %}
1) Build source table with custom enum type in postgres (using example here)
Using postgres IDE of choice:
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE dbt_dbeatty.person (
name text,
current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe', 'happy');
2) Take a snapshot:
dbt build
3) Truncate table, then rebuild
Using postgres IDE of choice again:
truncate TABLE dbt_dbeatty.person;
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
4) Take another snapshot:
dbt build
I'm also experiencing this issue. The macro postgres__get_columns_in_relation, which is called when the snapshot checks for any column changes, selects data_type
from the information_schema
. For any UDT this will return USER-DEFINED. We can instead conditionally select udt_name
to get the name of the enum.
As a temporary fix, I have overridden postgres__get_columns_in_relation in override_default_macros.sql to resolve udt names along with their schema:
{% macro postgres__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
column_name,
case
when (data_type = 'USER-DEFINED') then concat(udt_schema || '.' || udt_name)
else data_type
end as data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from {{ relation.information_schema('columns') }}
where table_name = '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position
{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
After attempting again with the following steps, I was able to reproduce the reported issue.
Also confirmed that the solution from @benorourke worked 🏅
1) Create a simple table that uses only built-in data types
Using postgres IDE of choice:
CREATE TABLE dbt_dbeatty.person (
name text
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe');
2) Take a snapshot:
dbt snapshot
3) Add a column that uses a custom data type
Build source table with custom enum type in postgres (using example here).
Using postgres IDE of choice again:
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
-- Yeah, a simple ALTER TABLE ... would be better here! But this is sufficient to reproduce
DROP TABLE dbt_dbeatty.person;
CREATE TABLE dbt_dbeatty.person (
name text,
current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
4) Take another snapshot:
dbt snapshot
💥
5) Override the postgres__get_columns_in_relation
macro
Copy-paste the solution described here (untested) or here into macros/overrides.sql
6) Retry taking another snapshot:
dbt snapshot
✅
I have a similar bug, which relates to this:
If I have an incremental table with option on_schema_change='sync_all_columns'
. If I know add a column with user-defined type to the incremental model, the same error occurs.
It can be fixed with the provided solution.
Environment: PostgreSQL, dbt 1.6.1
Here is an alternative override for postgres__get_columns_in_relation
that might be better (but I didn't test it out with this issue):
{% macro postgres__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
col.attname as column_name,
-- Two options for formatting the data type:
-- 1) do not include the type modifier (e.g., `geometry`)
-- 2) include the type modifier when appliable (e.g., `geometry(Point,4326)`)
-- Option 1:
pg_catalog.format_type(col.atttypid, null) as data_type,
-- Option 2:
-- pg_catalog.format_type(col.atttypid, col.atttypmod) as data_type,
null as character_maximum_length,
null as numeric_precision,
null as numeric_scale
from pg_catalog.pg_namespace sch
join pg_catalog.pg_class tbl on tbl.relnamespace = sch.oid
join pg_catalog.pg_attribute col on col.attrelid = tbl.oid
left outer join pg_catalog.pg_description tbl_desc on (tbl_desc.objoid = tbl.oid and tbl_desc.objsubid = 0)
left outer join pg_catalog.pg_description col_desc on (col_desc.objoid = tbl.oid and col_desc.objsubid = col.attnum)
where 1=1
and not pg_is_other_temp_schema(sch.oid) -- not a temporary schema belonging to another session
and tbl.relpersistence in ('p', 'u') -- [p]ermanent table or [u]nlogged table. Exclude [t]emporary tables
and tbl.relkind in ('r', 'v', 'f', 'p') -- o[r]dinary table, [v]iew, [f]oreign table, [p]artitioned table. Other values are [i]ndex, [S]equence, [c]omposite type, [t]OAST table, [m]aterialized view
and col.attnum > 0 -- negative numbers are used for system columns such as oid
and not col.attisdropped -- column has not been dropped
and tbl.relname = '{{ relation.identifier }}'
{% if relation.schema %}
and sch.nspname = '{{ relation.schema }}'
{% endif %}
order by
sch.nspname,
tbl.relname,
col.attnum
{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
This issue also exists in unit tests. The workaround provided by @dbeatty10 gets us past the USER-DEFINED
cast in the generated SQL for the unit-test, but any default geometry
values given in the input
/expect
rows are wrapped as strings. I'm guessing there is another macro somewhere that would fix that.
Is this a new bug in dbt-core?
Current Behavior
I've set-up a snapshot table in a postgres database, and it has a column named
annotation_type
that uses a custom enum datatype. This has worked without problems for weeks, but after the table had to be truncated and restored due to a faulty replica setup we encountered the following error when trying t:I cannot find the sql file that the error is in reference to. the snapshot table was also suddenly lacking the
annotation_type
column, despite snapshots supposedly never deleting columns? We are at a development stage where we can still afford to do a full restart of the snapshot, but we'd rather not encounter this in the future. Our hypothesis is that the custom type is somehow not matched appropriately following regeneration?Expected Behavior
I would expect snapshots to be robust to no-ops like this, and at least to never incur data loss. If custom types are unsupported/problematic, it'd be nice to be warned.
Steps To Reproduce
Not easily reproduced, but most likely course:
Relevant log output
No response
Environment
No response
Which database adapter are you using with dbt?
postgres
Additional Context
No response