dbt-msft / tsql-utils

dbt-utils for the dbt-msft family of packages
MIT License
25 stars 25 forks source link

Add support for bigger varchar field in surrogate key #14

Closed jkock closed 3 years ago

jkock commented 3 years ago

Changed line 33 to allow for bigger varchar fields. Original:

cast(" ~ field ~ " as " ~ dbt_utils.string() ~ ")

which translates to

CAST( {{ field }} AS varchar(30))

New

CAST( {{ field }} AS varchar(max))

dataders commented 3 years ago

newb question, but length 30 is what is used by the other adapters (BigQuery Redshift and Snowflake)? My guess is these databases don't have a max option?

jkock commented 3 years ago

newb question, but length 30 is what is used by the other adapters (BigQuery Redshift and Snowflake)? My guess is these databases don't have a max option?

These databases generally do not have a length, since they are column store based and use compression which is independent of length

For SQL server, the default length is 30 when not specified when using CAST or CONVERT, see also this link.

To be even more precise, the length is not characters, but bytes, which with an encoding of for example UTF-8 can be different than the number of characters

I would have liked to change the dbt_utils.string() function, but it does not accept any parameters, so I think this is the best bet.

dataders commented 3 years ago

thanks for the explanation!

Just curious what parameters would you like dbt_utils.string() to accept?

in a normal PR, I'd have you delete this this line so the test passes, but this won't work until we get https://github.com/fishtown-analytics/dbt-utils/pull/312 through so that assert_equal is working. https://github.com/dbt-msft/tsql-utils/blob/e0e6914fa47988ec9e59e8facc0fe8fc738b0754/integration_tests/dbt_utils/dbt_project.yml#L57

alittlesliceoftom commented 3 years ago

This change looks excellent in that it solves the problem we have.

It does slightly increase the complexity of the project as we're changing deep in the file, very customised.

We could instead consider setting in the datatypes of tsql-utils - https://github.com/dbt-msft/tsql-utils/blob/master/macros/dbt_utils/cross_db_utils/datatypes.sql

Simply changing VARCHAR to VARCHAR(MAX).

I'd slightly prefer at this point to set to a smaller number like 8000 (max without offering risk of going into LOB) or 900 (max whilst still allowing indexes to set on it.) - https://stackoverflow.com/questions/2091284/varcharmax-everywhere

What do you think of setting our default string to VARCHAR(900)

alittlesliceoftom commented 3 years ago

Added the alternative for your consideration. https://github.com/dbt-msft/tsql-utils/pull/15

dataders commented 3 years ago

moving forward with #15 instead, until we think of a better way to override dbt_utils.string()