Embedding views with sub-selects

When embedding views, the following breaks the detection algorithm:

-- this works
CREATE VIEW german_films AS
SELECT *,
       (SELECT 'hello') AS world
  FROM films
 WHERE language='de';

-- while this does not work
CREATE VIEW german_films AS
SELECT *,
       COALESCE((SELECT 'hello'), 'bye') AS world
  FROM films
 WHERE language='de';

What happens here is:

the SQL query in allSourceColumns tries to parse the view definition in form of a pg_node_tree (https://www.postgresql.org/docs/current/catalog-pg-rewrite.html)
to be able to read the embeddable columns, all sub-selects have to be filtered out
to filter out the sub-selects a regex is used
this regex fails when there is a function call around it (e.g. COALESCE)
because the regex fails, no columns are detected at all for embedding

My current solution is to wrap another SELECT around the COALESCE - that works fine.

I have played a bit with changing up the query in allSourceColumns - I think the structure of pg_node_tree does not allow a proper regex solution to really get rid of all sub-selects. However, I was able to apply a couple of transformations to the node tree to bring it to JSON format - which can easily be parsed for the embeddable columns. This was only tested with PG12 so far and also not very extensively, yet. (just noted that it fails on some of the pg_catalog views apparently - so there is definitely some more tweaking needed for the transform_json part)

The query currently looks like this:


with
views as (
select
   n.nspname   as view_schema,
   c.relname   as view_name,
   r.ev_action as view_definition
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
join pg_rewrite r on r.ev_class = c.oid
where c.relkind in ('v', 'm') and n.nspname = 'public' -- replace in haskell with ANY ($1)
),
transform_json as (
   select
     view_schema,
     view_name,
     replace(
      regexp_replace(
       regexp_replace(
        regexp_replace(
         replace(
          replace(
           replace(
            replace(
             view_definition,
            '({', '[{'),
           '})', '}]'),
          '} {', '},{'),
         '"', '\"'),
        '{([A-Z]+)', '{"node":"\1"', 'g'),
       ':([A-Za-z_]+) ([^[{][^:}]*)( |})', ',"\1":"\2"\3', 'g'),
      ':([A-Za-z_]+) ', ',"\1":', 'g'),
     '\ ', ' ')::jsonb AS view_definition
   from views
),
target_lists as (
   select
     view_schema,
     view_name,
     view_definition->0->'targetList' AS target_list
   from transform_json
),
target_entries as (
   select
     view_schema,
     view_name,
     jsonb_array_elements(target_list) as target_entry
   from target_lists
   where jsonb_typeof(target_list) = 'array'
),
results as (
   select
     view_schema,
     view_name,
     target_entry->>'resname' AS view_column_name,
     target_entry->>'resorigtbl' AS resorigtbl,
     target_entry->>'resorigcol' AS resorigcol
   from target_entries
)
select
   sch.nspname as table_schema,
   tbl.relname as table_name,
   col.attname as table_column_name,
   res.view_schema,
   res.view_name,
   res.view_column_name
from results res
join pg_class tbl on tbl.oid::text = res.resorigtbl
join pg_attribute col on col.attrelid = tbl.oid and col.attnum::text = res.resorigcol
join pg_namespace sch on sch.oid = tbl.relnamespace
where resorigtbl <> '0'
order by view_schema, view_name, view_column_name;

In theory this should allow a much more robust parsing of the view definition, but ONLY IF the json transformation works flawlessly.

Just thought I post that here, in case anyone wants to play with this, before I continue to.

PostgREST / postgrest

Embedding views with sub-selects #1608