dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
101 stars 59 forks source link

[CT-2161] [Feature] Support altering tables dist/sort keys #333

Open alejandrofm opened 1 year ago

alejandrofm commented 1 year ago

Is this a new bug in dbt-redshift?

Current Behavior

Edited the size of a varchar field on Redshift, DBT makes an update to recreate the column, after the update it tries to drop de column but can't because is a SORTKEY.

Expected Behavior

The SORTKEY must be changed (or dropped) before trying to update the field, and then it have to be recreated

Steps To Reproduce

Change the length of a varchar field (must be sortkey) on the destination table on Redshift. Run the dbt model

Relevant log output

[2023-02-22, 16:57:40 UTC] {pod_manager.py:228} INFO - 16:57:40  Database Error in model xxxx.sql (models/ab_testing/xxxx.sql)
[2023-02-22, 16:57:40 UTC] {pod_manager.py:228} INFO - 16:57:40    cannot drop sortkey column "column_2"

Environment

- Python: 3.9
- dbt-core: 1.3.1
- dbt-redshift: 1.3.1

Additional Context

No response

dbeatty10 commented 1 year ago

Thanks for reporting this @alejandrofm !

Could you share a simple example of a model (and its configuration) that could run into this problem? Specifically, it would be nice to see which materialization type you are using (table, incremental, etc) and also which config parameters you are using. e.g., could you share something like this?

-- Example with interleaved sort keys
{{ config(materialized='table',
          sort_type='interleaved'
          sort=['id', 'category'],
          dist='received_at')
}}

select ...
alejandrofm commented 1 year ago

Sure! this is the config:

{{
    config(
        materialized='incremental',
        sql_where="TRUE",
        unique_key="events_date",
        incremental_strategy='delete+insert',
        sort=['column1', 'column2', 'column3'],
        dist='column1'
    )
}}
dbeatty10 commented 1 year ago

Thanks for providing this config @alejandrofm

I haven't tried it myself yet. Do you think it would be solved if https://github.com/dbt-labs/dbt-redshift/issues/335 is implemented? (Thanks for opening both of these issues, by the way! Good stuff.)

alejandrofm commented 1 year ago

Will solve it only if the ALTER COLUMN is done for increasing as well as decreasing field sizes. Increasing is "easy", decreasing needs one check to know if the old data fits on the new field size. but if both get implemented on #335 then this issue has no point anymore. Thanks!

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

alejandrofm commented 1 year ago

I'm just bumping it so it doesn't get closed, it's a real issue for several use cases. Thanks

rriley99 commented 10 months ago

Any updates on fixing this behavior? It seems intermittent in my production environment.

colin-rogers-dbt commented 10 months ago

I think we should treat this as a new feature and fully support altering tables dist/sort keys in addition to columns similar to what we do now for materialized views.

dbeatty10 commented 10 months ago

@colin-rogers-dbt This might be the same/similar to https://github.com/dbt-labs/dbt-redshift/issues/335.

Do you think this could be the key part of the implementation (in addition to applicable test cases)?

{% macro redshift__alter_column_type(relation, column_name, new_column_type) -%}
  {% call statement('alter_column_type') %}
    alter table {{ relation }} alter column {{ adapter.quote(column_name) }} type {{ new_column_type }};
  {% endcall %}
{% endmacro %}
dbeatty10 commented 10 months ago

Do you think this could be the key part of the implementation (in addition to applicable test cases)?

Per @rriley99 here, alter won't work if the column is part of a distkey or sortkey:

I believe that will throw an error as you cannot alter dist/sort columns directly, you will need to do something like this:

create table new_table
   distkey (dist_col)
   sortkey (sort_col)
as
select *
from old_table;

Then swap the name. At least that is the best I could come up with for our ops engineers.