Open stevenconnorg opened 6 months ago
@stevenconnorg Thanks for reporting this and sorry for the late response. I'm okay with the pg_get_cols
approach you described. Would you be willing to contribute the fix? I don't currently have access to a Redshift cluster and I'm therefore unable to test this. I'd appreciate your help :)
@stevenconnorg and @stumelius , what are your thoughts on adding additional support for external table references in addition to aforementioned views for Redshift? I have a local modification that I've made to dbt_utils package, that would likely also be required to dbt-profiler to take svv_external_schemas into account. Below is a local modification I had to make within a dbt_utils macro. Works well thus far.
with internalDbCTE as (
select distinct
table_schema as {{ adapter.quote('table_schema') }},
table_name as {{ adapter.quote('table_name') }},
{{ dbt_utils__vDGS.get_table_types_sql() }}
from {{ database }}.information_schema.tables
where table_schema ilike '{{ schema_pattern }}'
and table_name ilike '{{ table_pattern }}'
and table_name not ilike '{{ exclude }}'
)
-- following necessary for redshift external tables. May not work for other DBs
, externalDbCTE as (
select distinct
schemaname as {{ adapter.quote('schemaname') }},
tablename as {{ adapter.quote('tablename') }},
'EXTERNAL TABLE' as table_type
from pg_catalog.svv_external_columns
where redshift_database_name ilike '{{ database }}'
and schemaname ilike '{{ schema_pattern }}'
and tablename ilike '{{ table_pattern }}'
and tablename not ilike '{{ exclude }}'
)
select * from internalDbCTE
union all
select * from externalDbCTE
Here's an example of a call on my local DB
@odikia Thank your for the suggestion. I don't have anything against adding support for external tables. Now, coming back to the issue of views not being included. In the query you provided, there's a union between information_schema.tables
and pg_catalog.svv_external_columns
- neither of these contain views, right? Do you know where we could get the view data? pg_get_cols
works at the relation level and we'd have to loop over all the relations to get a complete picture - not ideal.
@stumelius , the naming convention may be wonky, but information_schema.tables does contain views! My query is just an example of a union that results in VIEW
, BASE TABLE
, and EXTERNAL TABLE
information. If this union, or just information_schema.tables
is okay given the local results in Redshift (pic I shared in this comment), then it will take a bit more testing given that the dtype differences between pg_catalog and information_schema may not function well downstream for dbt-profiler. I'll develop some use cases for each type of database object locally, to test out the union.
@odikia Got it, thanks for clarifying! Feel free to open a PR when you have something and I'll have a look at it :)
It appears to be an issue in macro
redshift__select_from_information_schema_columns
in getting columns/data types from views.Update to use
pg_get_cols
, e.g.: