dbt-msft / dbt-sqlserver

dbt adapter for SQL Server and Azure SQL
MIT License
216 stars 101 forks source link

Slow performance when materialization as table (`SELECT * INTO Model__dbt_tmp FROM Model__dbt_tmp_temp_view`) #410

Open romiof opened 1 year ago

romiof commented 1 year ago

Hello!

This is a bug and a suggestion for improvement... I'm suffering with slow performance when materialization as table.

My environment

I started this month to use dbt for MS SQL. My database is a SQL Server 2014. Using the last version of dbt 1.4.3

My issue

I'm having problems whit materialization macro which make the tables at our SQL Server. Its performance is very, very slow.

More specific, my project consist in:

What I investigated

A possible workaround

I also discovered, that there is a hint which make SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ runs with its original perfomance: OPTION (FORCE ORDER)

I founded it from this SO post.

So, if I run this query SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ OPTION (FORCE ORDER) from my MS SQL Studio, it create my Model table in less than 1 minute.

Setting this option at dbt-sqlserver

I'd like to know, how can we change this macro to have a possibility to include this hint ? Maybe a configuration, like as_columnstore ?

matsonj commented 1 year ago

For what it’s worth, nested views perform horribly in SQL Server because of how the query planner works.

That being said, I really like the idea of supporting query hints as config in the macros.

edvald-kvika commented 1 year ago

Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order) into the create_table_as macro unfortunately did not fix the issue though:

{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
   {#- TODO: add contracts here when in dbt 1.5 -#}
   {%- set sql_header = config.get('sql_header', none) -%}
   {%- set as_columnstore = config.get('as_columnstore', default=true) -%}
   {%- set temp_view_sql = sql.replace("'", "''") -%}
   {%- set tmp_relation = relation.incorporate(
        path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
        type='view') -%}

   {{- sql_header if sql_header is not none -}}

    -- drop previous temp view
   {{- sqlserver__drop_relation_script(tmp_relation) }}

    -- create temp view
   USE [{{ relation.database }}];
   EXEC('create view {{ tmp_relation.include(database=False) }} as
    {{ temp_view_sql }}
    ');

   -- select into the table and create it that way
   {# TempDB schema is ignored, always goes to dbo #}
   SELECT *
   INTO {{ relation.include(database=False, schema=(not temporary))  }}
   FROM {{ tmp_relation }} option (force order)
   -- drop temp view
   {{ sqlserver__drop_relation_script(tmp_relation) }}

   {%- if not temporary and as_columnstore -%}
        -- add columnstore index
        {{ sqlserver__create_clustered_columnstore_index(relation) }}
   {%- endif -%}

{% endmacro %}
romiof commented 1 year ago

Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order) into the create_table_as macro unfortunately did not fix the issue though:

@edvald-kvika

Here, I've experienced some models with slow performance without option (force order). About 10~15% of my models are in this situation.

For them, I created a config at model code, and had created a copy of macro sqlserver__create_table_as with an if statement to implement force order in case my config == true.

At my job, our SQL Server is not exclusive to our DW. I've about a dozen of other DBs used by other workloads (in general OLTP, but also have a couple of DBs for OLAP, where our users connect their Excel's at SQL tables and analyze data with Pivot Table).

So, I've moments during the workday, where dbt run hang with slows performance, but it's during resource concurrency in SQL Server.

In general, without bottlenecks in SQL Server, a dbt run model_abc is about 1.5x to 2x the time needed to SELECT * the model_abc SQL code.

edvald-kvika commented 1 year ago

@romiof we just patched sqlserver__create_table_as in a larger dbt project (adding option (force order)) and model performance time greatly improved, from around 900 seconds down to 30.

edvald-kvika commented 1 year ago

Now we have a bit more experience with option (force order). In some cases it drastically improves performance and in other cases is does the exact opposite. Looking at the query plans for the select into query showed that in those cases option (force order) put the optimizer into a heavy (and early) table spool operation. Our solution was to add a config parameter for force_order:

{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
   {#- TODO: add contracts here when in dbt 1.5 -#}
   {%- set sql_header = config.get('sql_header', none) -%}
   {%- set as_columnstore = config.get('as_columnstore', default=true) -%}
   {%- set force_order = config.get('force_order', default=true) -%}
   {%- set temp_view_sql = sql.replace("'", "''") -%}
   {%- set tmp_relation = relation.incorporate(
        path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
        type='view') -%}

   {{- sql_header if sql_header is not none -}}

    -- drop previous temp view
   {{- sqlserver__drop_relation_script(tmp_relation) }}

    -- create temp view
   USE [{{ relation.database }}];
   EXEC('create view {{ tmp_relation.include(database=False) }} as
    {{ temp_view_sql }}
    ');

   -- select into the table and create it that way
   {# TempDB schema is ignored, always goes to dbo #}
   SELECT *
   INTO {{ relation.include(database=False, schema=(not temporary))  }}
   {%- if force_order %}
   -- add option (force order) to improve performance in nested views:
   -- https://github.com/dbt-msft/dbt-sqlserver/issues/410
   {%- endif %}
   FROM {{ tmp_relation }} {%- if force_order %} option (force order) {%- endif %}
   -- drop temp view
   {{ sqlserver__drop_relation_script(tmp_relation) }}

   {%- if not temporary and as_columnstore -%}
        -- add columnstore index
        {{ sqlserver__create_clustered_columnstore_index(relation) }}
   {%- endif -%}

{% endmacro %}

then in the table config we can turn it off where needed:

{{
  config(
    materialized = 'incremental',
    unique_key = ['account', 'level', 'month'],
    force_order = False
    )
}}