I created an iceberg table with the 'partition_by' option, and when I use it to run unit tests or create docs, it misbehaves because it reads the wrong value like 'part 0' into the schema.
{{ config(
materialized='incremental',
file_format='iceberg',
iceberg_expire_snapshots='False',
table_properties={'format-version': '2'},
) }}
WITH data as (
SELECT app_name,
device_id,
row_number() OVER (
PARTITION BY device_id
ORDER BY app_name, device_id DESC
) as rn
FROM {{ source('schema_name', 'source_table') }}
GROUP BY
app_name,
device_id
ORDER BY rn DESC
)
SELECT device_id as filter_ids
FROM data
WHERE rn > 1
GROUP BY device_id
unit test profile
unit_tests:
- name: test_hello_world
# Always only one transformation to test
model: hello_world
# No inputs needed this time!
# Most unit tests will have inputs -- see the "real world example" section below
given:
- input : source('schema_name', 'source_table')
rows:
- {device_id : test123, app_name : test1}
- {device_id : test123, app_name : test1}
- {device_id : test456, app_name : test2}
# Expected output can have zero to many rows
expect:
rows:
- {filter_ids: test123}
- {filter_ids: test456}
Expected behavior
when 'dbt test --select hello_world --target dev
run 'ok'
Screenshots and log output
error log
-- Fixture for source_table
select cast('test1' as string)
as app_name, cast(null as string) as status, cast('test123' as string)
.... (other columns)
cast(null as timestamp) as part 0
-----------------------------------------^^^
), data as (
SELECT app_name,
device_id,
row_number() OVER (
PARTITION BY device_id
ORDER BY app_name, device_id DESC
) as rn
FROM __dbt__cte__source_table
GROUP BY
app_name,
device_id
ORDER BY rn DESC
)
SELECT device_id as filter_ids
FROM data
WHERE rn > 1
GROUP BY device_id
) as __dbt_sbq
where false
limit 0
System information
The output of dbt --version:
Core:
- installed: 1.8.4
- latest: 1.8.4 - Up to date!
Plugins:
- spark: 1.8.0 - Up to date!
- glue: 1.8.1 - Up to date!
The operating system you're using:
The output of python --version:
Additional context
The problem occurs when doing unit tests, but also when importing schema from glue metadata via dbt docs generate, 'part 0' is generated unnecessarily.
Describe the bug
I created an iceberg table with the 'partition_by' option, and when I use it to run unit tests or create docs, it misbehaves because it reads the wrong value like 'part 0' into the schema.
Steps To Reproduce
make source_table config for Iceberg table:
use source profile example
test model (hello_world.sql)
unit test profile
Expected behavior
when 'dbt test --select hello_world --target dev run 'ok'
Screenshots and log output
error log
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Additional context
The problem occurs when doing unit tests, but also when importing schema from glue metadata via dbt docs generate, 'part 0' is generated unnecessarily.