dbt-labs / dbt-codegen

Macros that generate dbt code
https://hub.getdbt.com/dbt-labs/codegen/latest/
Apache License 2.0
459 stars 99 forks source link

(Postgres) Macro generate_model_yaml does not work with Materialized Views #155

Closed drewblinn closed 7 months ago

drewblinn commented 8 months ago

Describe the bug

The generate_model_yaml macro does not return any information about materialized views.

The macro queries information_schema.columns which does not store data about materialized views.

Steps to reproduce

Create a Materialized view

{{ config(
    materialized="materialized_view"
) }}

(SELECT
    1 as int_col,
    'one' as string_col,
    FALSE as bool_col)
UNION
(SELECT
    2 as int_col,
    'two' as string_col,
    TRUE as bool_col)
UNION
(SELECT
    3 as int_col,
    'three' as string_col,
    FALSE as bool_col)

Build the Model dbt build -s mv_gene_yaml_test

Run the macro {{ codegen.generate_model_yaml(model_names=['mv_gene_yaml_test']) }} or dbt run-operation codegen.generate_model_yaml --args '{"model_names": ["mv_gene_yaml_test"]}'

Expected results

The macro should return columns

version: 2

models:
  - name: mv_gene_yaml_test
    description: ""
    columns:
      - name: int_col
        data_type: integer
        description: ""

      - name: string_col
        data_type: text
        description: ""

      - name: bool_col
        data_type: boolean
        description: ""

Actual results

The macro only returns the model name and description, no columns

version: 2

models:
  - name: mv_gene_yaml_test
    description: ""
    columns:

Screenshots and log output

20:31:03 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f7877fb7df0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e6e3b160>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e6e3b9d0>]}
20:31:03 Running with dbt=1.7.4
20:31:03 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'profiles_dir': '/usr/src/dbt-server-shared/.dbt', 'log_path': '/usr/src/dbt-server-shared/working_dir/fdf120ab-15d2-46e4-8bab-a9eb6aea9c22', 'version_check': 'True', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'json', 'invocation_command': 'dbt -A dbt_worker.app worker --loglevel=DEBUG --concurrency=2 --max-memory-per-child=500000', 'introspect': 'True', 'static_parser': 'True', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
20:31:03 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '4bf8ed52-966e-49a6-86b5-a003a5746d59', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f786dd0a100>]}
20:31:03 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '4bf8ed52-966e-49a6-86b5-a003a5746d59', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f7878f74070>]}
20:31:03 Registered adapter: postgres=1.7.4
20:31:03 Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '4bf8ed52-966e-49a6-86b5-a003a5746d59', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f786c922be0>]}
20:31:03 Found 89 models, 1 analysis, 4 operations, 1 seed, 55 tests, 62 sources, 0 exposures, 0 metrics, 736 macros, 0 groups, 0 semantic models
20:31:03 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '4bf8ed52-966e-49a6-86b5-a003a5746d59', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f786c922a60>]}
20:31:03 Acquiring new postgres connection 'macro_generate_model_yaml'
20:31:03 Using postgres connection "macro_generate_model_yaml"
20:31:03 On macro_generate_model_yaml: BEGIN
20:31:03 Opening a new connection, currently in state init
20:31:04 SQL status: BEGIN in 1.0 seconds
20:31:04 On macro_generate_model_yaml: COMMIT
20:31:04 Using postgres connection "macro_generate_model_yaml"
20:31:04 On macro_generate_model_yaml: COMMIT
20:31:04 SQL status: COMMIT in 0.0 seconds
20:31:04 checksum: 79f000ac4c6d245b166a0e7a4a95b0551d3ac589edfe0d85b64b9ebfeed4ab8c, vars: {}, profile: user, target: , version: 1.7.4
20:31:05 Using postgres connection "macro_generate_model_yaml"
20:31:05 On macro_generate_model_yaml: BEGIN
20:31:06 SQL status: BEGIN in 0.0 seconds
20:31:06 Using postgres connection "macro_generate_model_yaml"
20:31:06 On macro_generate_model_yaml: 
      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from dataskrivedb.INFORMATION_SCHEMA.columns
      where table_name = 'mv_gene_yaml_test'

        and table_schema = 'dbt_dblinn'

      order by ordinal_position

20:31:06 SQL status: SELECT 0 in 0.0 seconds
20:31:06 On macro_generate_model_yaml: ROLLBACK
20:31:06 On macro_generate_model_yaml: Close
20:31:06 Resource report: {"command_name": "run-operation", "command_success": true, "command_wall_clock_time": 2.5867724, "process_user_time": 13.458196, "process_kernel_time": 0.286175, "process_mem_max_rss": "192608", "process_in_blocks": "7232", "process_out_blocks": "31208"}
20:31:06 Command `cli run-operation` succeeded at 20:31:06.197740 after 2.59 seconds
20:31:06 Connection 'macro_generate_model_yaml' was properly closed.
20:31:06 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f7877fb7df0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f786c9223d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f7878f74040>]}
20:31:06 Flushing usage events
Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f7877fb7df0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e6e3b160>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e6e3b9d0>]}

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
  - package: dbt-labs/codegen
    version: 0.12.0
  - package: dbt-labs/dbt_project_evaluator
    version: 0.8.0
  - package: Snowflake-Labs/dbt_constraints
    version: 0.6.2
  - package: brooklyn-data/dbt_artifacts
    version: 2.6.2

Which database are you using dbt with?

The output of dbt --version: Command failed in dbt cloud, but I am using 1.7.4 as seen in the log above

The operating system you're using: Windows (dbt cloud)

The output of python --version: N/A

Additional context

The macro queries information_schema.columns which does not store data about materialized views. Column information for MVs can be found in pg_catalog.pg_attribute

Are you interested in contributing the fix?

More than happy to help!

dave-connors-3 commented 7 months ago

Hey @drewblinn ! thanks so much for the issue -- did a little digging, and I think this may indeed be an issue with dbt-postgres. In our macro, we call adapter.get_columns_in_relation here to populate the columns using the query you noted above. This actually calls the macro as defined within whichever dbt adapter you're using, and in your case, that postgres macro is defined here. Indeed, no checks to pg_catalog!

I opened an issue on dbt-core over here -- hopefully we can get that fixed soon!

drewblinn commented 7 months ago

Thanks for taking a look @dave-connors-3! Let me know if I can help in any way

dbeatty10 commented 7 months ago

Thanks @drewblinn and @dave-connors-3 🏆

Just wrote up an explanation here.

TLDR: the definition of the information_schema.columns view in postgres is just missing 'm'::"char" which is causing [m]aterialized views to not show up.

@drewblinn that write-up contains a workaround you can try in the meantime. Could you give it a shot and see if it works for you?

drewblinn commented 7 months ago

@dbeatty10 the workaround works great! thanks for your work on this

dbeatty10 commented 7 months ago

Awesome to hear @drewblinn 🎉

I've opened a draft PR that contains this solution: https://github.com/dbt-labs/dbt-core/pull/9433

dbeatty10 commented 7 months ago

Since we're going to address the underlying issue in dbt-postgres, I'm closing this issue in favor of https://github.com/dbt-labs/dbt-core/issues/9419.