awslabs / amazon-redshift-utils

Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment
Apache License 2.0
2.76k stars 1.25k forks source link

Finding dependencies of materialized views #499

Open redthor opened 4 years ago

redthor commented 4 years ago

Hi there.

The v_view_dependency script: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_view_dependency.sql#L1 does not work for materialized views.

I had a table that would not drop without 'cascade'. I could not find a dependency via the view. Dropping the table I discovered a materialized view was dropped. It would be useful if we could use the v_view_dependency view for materialized views.

thanks

vitdexoff commented 4 years ago

It appears that all the views, find_depend and admin views for constraint and view dependency fail to list the source schema and table when it comes to materialized views. Here's an example:

  1. Created table public.test1
  2. Created schema private
  3. Create materialized view private.test1_pmv as select * from public.test1
  4. Queried find_depend (https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_TABLE.html) and v_view_dependency. Following were the results:

dev=# select * from v_view_dependency where dependent_objectname='test1_pmv'; src_oid | src_schemaname | src_objectname | dependent_viewoid | dependent _schemaname | dependent_objectname ---------+----------------+----------------------+-------------------+---------- ------------+---------------------- 329361 | private | mv_tbl__test1_pmv__0 | 329364 | private
| test1_pmv (1 row)

dev=# select * from find_depend where refbyname='test1_pmv'; tbloid | schemaname | name | refbyschemaname | refbyname | viewoid --------+------------+----------------------+-----------------+-----------+--------- 376 | pg_catalog | pg_xactlock | private | test1_pmv | 329364 329361 | private | mv_tbl__test1_pmv__0 | private | test1_pmv | 329364 329364 | private | test1_pmv | private | test1_pmv | 329364 (3 rows)

As evident above, the views fail to list public.test1 as the source schema/object. We probably need modification to the existing scripts to account for such scenarios? It looks like the only way to check for mv dependencies is to look at the view definition... A direct query also work:

select oid, relname from pg_class where oid in (select objid from pg_depend where refobjid = );

jallender commented 4 years ago

While this has not been fixed. Anyone who makes it here may wish to look at https://stackoverflow.com/a/62337897/11395802 for a way to determine if a materialized view has the desired table in its definition.

select schemaname, viewname from pg_views where schemaname not like 'pg_catalog' and schemaname not like 'information_schema' and definition like '%<tablename>%';

lennertr commented 3 years ago

Hi, I stumbled upon the same issue. @jallender answer is good if you need to know if there are any dependencies for a specific table but I needed the dependencies for any table in one schema. I ended up with this solution:

create view admin.v_view_dependencies_fixed as (
  with h1 as (
      select generate_series as i
      from generate_series(1, 100) -- we can use this since the query only touches the leader node
  ),
  h2 as (
      select schemaname                                         as dependent_schema,
             viewname                                           as dependent_view,
             schemaname || '.' || viewname                      as dependent_full,
             regexp_substr(definition, 'schema_name\\.\\w+', 1, i) as dependency
      from pg_views
               cross join h1
      where schemaname = 'schema_name'
        and dependency is not null
        and dependency != ''
  )
  select distinct
         dependent_full,
         dependent_schema,
         dependent_view,
         dependency as source_full,
         split_part(dependency, '.', 1) as source_schema,
         split_part(dependency, '.', 2) as source_object
  from h2
  where dependent_full != source_full
);
chrisrink10 commented 3 years ago

I expanded on the previous commenter's view definition to handle a few extra things:

create view admin.v_mv_dependency as (
    with schemas as (
        -- select all non-system, non-administrative schemas
        select nspname
        from pg_namespace
        where nspname not like 'pg_%'
          and nspname not in ('information_schema', 'admin')
    ),
    schema_rels as (
        -- select all tables and views from all eligible schemas
        (
            select nspname as schemaname,
                   pgv.viewname as relname,
                   'view' as relkind
            from schemas
            join pg_views pgv on (schemas.nspname = pgv.schemaname)
        )
        union
        (
            select nspname as schemaname,
                   pgt.tablename as relname,
                   'table' as relkind
            from schemas
            join pg_tables pgt on (schemas.nspname = pgt.schemaname)
        )
    ),
    mviews as (
        -- select all materialized view definitions from the MV list
        select schemaname,
               viewname,
               definition
        from pg_views
        -- not exactly foolproof, but should be pretty decent at catching only MVs
        where definition ilike '%create materialized view%'
    ),
    possible_mview_dep_matches as (
        -- compute a list of possible matches just using simple substring heuristic
        -- on either schema name or relation (view/table) name
        select mvs.schemaname,
               mvs.viewname,
               any_value(mvs.definition) as definition,
               src.schemaname as objschema,
               relname as objname,
               any_value(relkind) as objkind
        from mviews mvs
            cross join schema_rels src
        where objname is not null
          and (strpos(mvs.definition, src.schemaname) != 0 or
               strpos(mvs.definition, src.relname) != 0)
        group by 1, 2, 4, 5
    ),
    mv_ref_matches as (
        (
            -- from the set of possible matches, find any possible table name matches
            -- by checking for FROM {table} and JOIN {table} within the view definition
            select vw.schemaname                       as dependent_schema,
                   vw.viewname                         as dependent_view,
                   vw.schemaname || '.' || vw.viewname as dependent_full,
                   vw.objschema                        as source_schema,
                   vw.objname                          as source_object,
                   vw.objkind                          as source_kind,
                   regexp_substr(vw.definition,
                                 '(from|join)[[:blank:]]+(' || vw.objschema || '\\.)?(' || vw.objname || ')', 1, 1,
                                 'i')                  as dependency_match,
                   'relation'                          as match_kind
            from possible_mview_dep_matches as vw
            where dependency_match is not null
              and dependency_match != ''
        )
        union
        (
            -- from the set of possible matches, find any possible column name matches
            -- by checking for ({schema}.)?{table}.{column}
            select vw.schemaname                                                                     as dependent_schema,
                   vw.viewname                                                                       as dependent_view,
                   vw.schemaname || '.' || vw.viewname                                               as dependent_full,
                   vw.objschema                                                                      as source_schema,
                   vw.objname                                                                        as source_object,
                   vw.objkind                                                                        as source_kind,
                   regexp_substr(vw.definition,
                                 '(' || vw.objschema || '\\.)?(' || vw.objname || ')\\.\\w+', 1, 1,
                                 'i')                                                                as dependency_match,
                   'column' as match_kind
            from possible_mview_dep_matches as vw
            where dependency_match is not null
              and dependency_match != ''
        )
    )
    select distinct dependent_full,
                    dependent_schema,
                    dependent_view,
                    dependency_match as source_full,
                    source_schema,
                    source_object,
                    source_kind,
                    match_kind
    from mv_ref_matches
    where dependent_full != source_full
);

There are some limitations with my approach which may make it unacceptable for some users:

nickthorpe commented 3 years ago

@chrisrink10 thank you for putting this together!

simonB2020 commented 1 year ago

@chrisrink10 Thanks.

I have enhanced the regex for 'relation'. My regex skills are not perfect so feel free to check these before using...

'(from|join)([[:blank:]]|\s)+((")?' || vw.objschema || '(")?\.)?((")?' || vw.objname || '(")?)([[:blank:]]|\s)'

Also looked at the 'column' regex.

'((")?' || vw.objschema || '(")?\.)((")?' || vw.objname || '(")?\.)\w+'

Cannot say that I have tested all scenarios - but working for me so far ....

JPags commented 1 week ago

So I tried the solutions listed and some of my schema names were coming up wrong, so I could not use to them to accomplish my task.

I had gotten an email from an AWS Redshift consultant we were working with and this is the query she gave us (Its simular to the original one in the view, but that does not work right for my case either so I figured I would give this a try.

select distinct srcnsp.nspname as src_schemaname,srcobj.relname as src_object, refnsp.nspname AS ref_schemaname, refobj.relname as ref_object from pg_catalog.pg_depend dep join pg_catalog.pg_class refobj on dep.refobjid=refobj.oid join pg_catalog.pg_class srcobj on dep.objid=srcobj.oid join pg_catalog.pg_namespace srcnsp on srcobj.relnamespace = srcnsp.oid join pg_catalog.pg_namespace refnsp on refobj.relnamespace = refnsp.oid;

this seems to get me the correct data that I needed. I have not verified every thing but it worked for my case.

Figured I would post incase anyone else needed a solution