dbt-labs / dbt-project-evaluator

This package contains macros and models to find DAG issues automatically
https://dbt-labs.github.io/dbt-project-evaluator/latest/
Apache License 2.0
447 stars 67 forks source link

Fix `base_source_columns` and `base_node_columns` in Redshift #467

Closed stevenconnorg closed 2 months ago

stevenconnorg commented 4 months ago

This is a:

Link to Issue

Description & motivation

Integration Test Screenshot

Screenshot 2024-06-24 at 1 13 07 PM

Checklist


b-per commented 4 months ago

Hi @stevenconnorg

Thanks for the work on this one!

At first, I am a bit worried of moving to varchar(50000) and changing all the type_string() in those models to type_large_string() for all users of the package.

From what I read here , I understand that Redshift does some optimization/memory allocation based on the types and I wouldn't want people who don't have any issue today with Redshift to be negatively impacted by performance following this change.

In your specific case, do you know what specific column(s) was/were creating the issue? And what varchar size this column has?

We might then be able to target the changes required to fix the reported issues.

stevenconnorg commented 4 months ago

Hey @b-per

For some reason, dbt.type_string() is creating character varying(1) columns, so this is causing the majority of the issues in these models (also reported in #463)

However, I'm still running into issues with redshift__type_large_string() in the description column, with the max string length around ~13k characters.

So maybe we bump the redshift__type_large_string() length to 15-20 thousand, and then figure out what's going on with the dbt.type_string()?

b-per commented 4 months ago

Yes, I think it is better to get to the bottom of it.

Do you have jq installed? If not, could you install it and do (I am on Mac, but we could do something similar in Windows)

cat target/manifest.json | jq '.macros | .[] | select(.name | contains("type_string"))'

and copy the results here?

My results look like

{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch('type_string', 'dbt')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.694219,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.default__type_string",
  "macro_sql": "{% macro default__type_string() %}\n    {{ return(api.Column.translate_type(\"string\")) }}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.694326,
  "supported_languages": null
}
{
  "name": "redshift__type_string",
  "resource_type": "macro",
  "package_name": "dbt_project_evaluator",
  "path": "macros/cross_db_shim/redshift_shims.sql",
  "original_file_path": "macros/cross_db_shim/redshift_shims.sql",
  "unique_id": "macro.dbt_project_evaluator.redshift__type_string",
  "macro_sql": "{%- macro redshift__type_string() -%}\n  {{ \"VARCHAR(600)\" }}\n{%- endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.810622,
  "supported_languages": null
}
stevenconnorg commented 4 months ago

Hey @b-per , here are the results from the jq command:

{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch('type_string', 'dbt')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.668191,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.default__type_string",
  "macro_sql": "{% macro default__type_string() %}\n    {{ return(api.Column.translate_type(\"string\")) }}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.668301,
  "supported_languages": null
}
{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch(\"type_string\", macro_namespace=\"dbt_profiler\")()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt_profiler.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.795249,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.default__type_string",
  "macro_sql": "{%- macro default__type_string() -%}\n  varchar\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.7953038,
  "supported_languages": null
}
{
  "name": "bigquery__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.bigquery__type_string",
  "macro_sql": "{%- macro bigquery__type_string() -%}\n  string\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.7953548,
  "supported_languages": null
}
{
  "name": "databricks__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.databricks__type_string",
  "macro_sql": "{%- macro databricks__type_string() -%}\n  string\n{%- endmacro -%}\n\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.795406,
  "supported_languages": null
}
{
  "name": "redshift__type_string",
  "resource_type": "macro",
  "package_name": "dbt_project_evaluator",
  "path": "macros/cross_db_shim/redshift_shims.sql",
  "original_file_path": "macros/cross_db_shim/redshift_shims.sql",
  "unique_id": "macro.dbt_project_evaluator.redshift__type_string",
  "macro_sql": "{%- macro redshift__type_string() -%}\n  {{ \"VARCHAR(600)\" }}\n{%- endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.9382882,
  "supported_languages": null
}
{
  "name": "edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.edr_type_string",
  "macro_sql": "\n\n\n{%- macro edr_type_string() -%}\n    {{ return(adapter.dispatch('edr_type_string', 'elementary')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.elementary.postgres__edr_type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390201,
  "supported_languages": null
}
{
  "name": "default__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.default__edr_type_string",
  "macro_sql": "{% macro default__edr_type_string() %}\n    {# Redshift #}\n    {% do return(\"varchar(4096)\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390293,
  "supported_languages": null
}
{
  "name": "postgres__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.postgres__edr_type_string",
  "macro_sql": "{% macro postgres__edr_type_string() %}\n    {% if var(\"sync\", false) %}\n        {% do return(\"text\") %}\n    {% else %}\n        {% do return(\"varchar(4096)\") %}\n    {% endif %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390466,
  "supported_languages": null
}
{
  "name": "snowflake__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.snowflake__edr_type_string",
  "macro_sql": "{% macro snowflake__edr_type_string() %}\n    {# Default max varchar size in Snowflake is 16MB #}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.3905568,
  "supported_languages": null
}
{
  "name": "bigquery__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.bigquery__edr_type_string",
  "macro_sql": "{% macro bigquery__edr_type_string() %}\n    {# Default max string size in Bigquery is 65K #}\n    {% do return(\"string\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390642,
  "supported_languages": null
}
{
  "name": "spark__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.spark__edr_type_string",
  "macro_sql": "{% macro spark__edr_type_string() %}\n    {% do return(\"string\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390722,
  "supported_languages": null
}
{
  "name": "athena__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.athena__edr_type_string",
  "macro_sql": "{% macro athena__edr_type_string() %}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390801,
  "supported_languages": null
}
{
  "name": "trino__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.trino__edr_type_string",
  "macro_sql": "{% macro trino__edr_type_string() %}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390879,
  "supported_languages": null
}
b-per commented 2 months ago

With #475 implemented and released in the latest version, I think that this is not needed anymore. Let's open a new issue if there is still an issue.