Open romiof opened 1 year ago
For what it’s worth, nested views perform horribly in SQL Server because of how the query planner works.
That being said, I really like the idea of supporting query hints as config in the macros.
Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order)
into the create_table_as
macro unfortunately did not fix the issue though:
{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
{#- TODO: add contracts here when in dbt 1.5 -#}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set as_columnstore = config.get('as_columnstore', default=true) -%}
{%- set temp_view_sql = sql.replace("'", "''") -%}
{%- set tmp_relation = relation.incorporate(
path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
type='view') -%}
{{- sql_header if sql_header is not none -}}
-- drop previous temp view
{{- sqlserver__drop_relation_script(tmp_relation) }}
-- create temp view
USE [{{ relation.database }}];
EXEC('create view {{ tmp_relation.include(database=False) }} as
{{ temp_view_sql }}
');
-- select into the table and create it that way
{# TempDB schema is ignored, always goes to dbo #}
SELECT *
INTO {{ relation.include(database=False, schema=(not temporary)) }}
FROM {{ tmp_relation }} option (force order)
-- drop temp view
{{ sqlserver__drop_relation_script(tmp_relation) }}
{%- if not temporary and as_columnstore -%}
-- add columnstore index
{{ sqlserver__create_clustered_columnstore_index(relation) }}
{%- endif -%}
{% endmacro %}
Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding
option (force order)
into thecreate_table_as
macro unfortunately did not fix the issue though:@edvald-kvika
Here, I've experienced some models with slow performance without option (force order)
.
About 10~15% of my models are in this situation.
For them, I created a config at model code, and had created a copy of macro sqlserver__create_table_as
with an if statement
to implement force order in case my config == true
.
At my job, our SQL Server is not exclusive to our DW. I've about a dozen of other DBs used by other workloads (in general OLTP, but also have a couple of DBs for OLAP, where our users connect their Excel's at SQL tables and analyze data with Pivot Table).
So, I've moments during the workday, where dbt run
hang with slows performance, but it's during resource concurrency in SQL Server.
In general, without bottlenecks in SQL Server, a dbt run model_abc
is about 1.5x to 2x the time needed to SELECT * the model_abc
SQL code.
@romiof we just patched sqlserver__create_table_as in a larger dbt project (adding option (force order)) and model performance time greatly improved, from around 900 seconds down to 30.
Now we have a bit more experience with option (force order)
. In some cases it drastically improves performance and in other cases is does the exact opposite. Looking at the query plans for the select into query showed that in those cases option (force order)
put the optimizer into a heavy (and early) table spool operation. Our solution was to add a config parameter for force_order
:
{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
{#- TODO: add contracts here when in dbt 1.5 -#}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set as_columnstore = config.get('as_columnstore', default=true) -%}
{%- set force_order = config.get('force_order', default=true) -%}
{%- set temp_view_sql = sql.replace("'", "''") -%}
{%- set tmp_relation = relation.incorporate(
path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
type='view') -%}
{{- sql_header if sql_header is not none -}}
-- drop previous temp view
{{- sqlserver__drop_relation_script(tmp_relation) }}
-- create temp view
USE [{{ relation.database }}];
EXEC('create view {{ tmp_relation.include(database=False) }} as
{{ temp_view_sql }}
');
-- select into the table and create it that way
{# TempDB schema is ignored, always goes to dbo #}
SELECT *
INTO {{ relation.include(database=False, schema=(not temporary)) }}
{%- if force_order %}
-- add option (force order) to improve performance in nested views:
-- https://github.com/dbt-msft/dbt-sqlserver/issues/410
{%- endif %}
FROM {{ tmp_relation }} {%- if force_order %} option (force order) {%- endif %}
-- drop temp view
{{ sqlserver__drop_relation_script(tmp_relation) }}
{%- if not temporary and as_columnstore -%}
-- add columnstore index
{{ sqlserver__create_clustered_columnstore_index(relation) }}
{%- endif -%}
{% endmacro %}
then in the table config we can turn it off where needed:
{{
config(
materialized = 'incremental',
unique_key = ['account', 'level', 'month'],
force_order = False
)
}}
Hello!
This is a bug and a suggestion for improvement... I'm suffering with slow performance when materialization as table.
My environment
I started this month to use dbt for MS SQL. My database is a SQL Server 2014. Using the last version of dbt
1.4.3
My issue
I'm having problems whit materialization macro which make the tables at our SQL Server. Its performance is very, very slow.
More specific, my project consist in:
yml files
, to map my tables to dbt objectstables
dbt run / dbt build
it performs very poor.What I investigated
_Model__dbt_tmp_temp_view_
is a nested view, and this is the origin of my problem.SELECT * FROM _Model__dbt_tmp_temp_view_
, everything works fine, the performance is equal when I select without the temp view.INTO Model__dbt_tmp
(https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/materializations/models/table/create_table_as.sql#LL23-L25) my SQL change its plan and it takes about 10 minutes to execute this materialization.SELECT * FROM _Model__dbt_tmp_temp_view_
vsSELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_
)A possible workaround
I also discovered, that there is a hint which make
SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_
runs with its original perfomance: OPTION (FORCE ORDER)I founded it from this SO post.
So, if I run this query
SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ OPTION (FORCE ORDER)
from my MS SQL Studio, it create my Model table in less than 1 minute.Setting this option at dbt-sqlserver
I'd like to know, how can we change this macro to have a possibility to include this hint ? Maybe a configuration, like
as_columnstore
?