Closed jelstongreen closed 2 years ago
@jelstongreen Thanks for opening, and providing the clear reproduction case. Given the log message you're seeing, it looks like the concat()
call gets skipped over entirely. I'm not sure why that would be...
Which version of dbt_utils
do you have installed? I'm not able to reproduce this locally using the latest version. I see dbt correctly calling spark__concat
, and including it in the rendered code:
# packages
packages:
- package: dbt-labs/dbt_utils
version: 0.9.2
{% set native_query = hash(concat(['a','b','c'])) %}
{{ log("Without namespace: " ~ native_query, info=True) }}
{% set dbt_utils_query = dbt_utils.hash(dbt_utils.concat(['a','b','c'])) %}
{{ log("With dbt_utils namespace: " ~ dbt_utils_query, info=True) }}
SELECT 1
$ dbt parse
17:37:38 Running with dbt=1.2.1
17:37:38 Start parsing.
17:37:38 Dependencies loaded
17:37:38 ManifestLoader created
17:37:38 Without namespace: md5(cast(concat(a, b, c) as string))
17:37:38 Warning: the `concat` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `concat` (no prefix) instead. The test.my_model model triggered this warning.
17:37:38 Warning: the `hash` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `hash` (no prefix) instead. The test.my_model model triggered this warning.
17:37:38 With dbt_utils namespace: md5(cast(concat(a, b, c) as string))
17:37:38 Manifest loaded
17:37:38 Manifest checked
17:37:38 Flat graph built
17:37:38 Manifest loaded
17:37:38 Performance info: target/perf_info.json
17:37:38 Done.
...
Hey @jtcohen6 thanks for looking into this! These are my packages:
packages:
- package: dbt-labs/dbt_utils
version: 0.9.2
- package: dbt-labs/spark_utils
version: 0.3.0
- package: dbt-labs/codegen
version: 0.8.0
- package: brooklyn-data/dbt_artifacts
version: 2.0.0
I'm thinking this could be do with my dispatch config which I've not really updated since first starting out with dbt - does this need to be amended?
dispatch:
- macro_namespace: dbt_utils
search_order:
- my_project
- spark_utils
- dbt_utils
@jelstongreen Yeah! What happens if you remove that dispatch
config entirely? The migration of utility macros into dbt-core
should mean that "shim" packages like spark_utils
are no longer needed here
@jtcohen6 Unfortunately no change with that section removed:
23:47:14 Running with dbt=1.2.1
23:47:14 Start parsing.
23:47:14 Dependencies loaded
23:47:14 ManifestLoader created
23:47:17 Without namespace md5(cast(a, b, c as string))
23:47:17 Warning: the `concat` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `concat` (no prefix) instead. The datalake_models.test model triggered this warning.
23:47:17 Warning: the `hash` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `hash` (no prefix) instead. The datalake_models.test model triggered this warning.
23:47:17 With dbt_utils namespace: md5(cast(concat(a, b, c) as string))
23:47:31 Manifest loaded
23:47:31 Manifest checked
23:47:32 Flat graph built
23:47:32 Manifest loaded
23:47:32 Performance info: target/perf_info.json
23:47:32 Done.
I've worked this one out:
We had an internal macro called concat
which when we were using dbt_utils.concat would not conflict because of the explicit namespace. This was the same for all packages which prefixed their use of concat
with dbt_utils
. However, as these packages now use concat
with no package prefix it was conflicting with our internal macro. I'm not sure why our internal macro would just not provide any output at all in this case but I have renamed the macro to something more unique to our use case (concatenation with affixes).
Probably no need for any resolution here although it would be handy to know if an internal macro is conflicting with a built in macro of the same name, particularly after a version change as can be difficult to determine otherwise. Thanks for your help @jtcohen6 !
@jelstongreen Ah, thanks for the update!
Definitely open to thinking about how it could be easier to debug these ones in the future. It's true that there's more possibility for namespace collision when calling macros from the global namespace, though with the benefit of making it simpler for folks to override specific macro behavior when they see the need.
Unfortunately, because this override happens during macro resolution, rather than dispatch resolution (subtle distinction), there's no way to identify at parse time / via depends_on.macros
that dbt_utils.surrogate_key
will be calling my_project.concat
instead of dbt_utils.concat
. This is what appears in manifest.json
:
"macro.dbt_utils.default__surrogate_key": {
"unique_id": "macro.dbt_utils.default__surrogate_key",
"package_name": "dbt_utils",
"root_path": "/Users/jerco/dev/scratch/testy/dbt_packages/dbt_utils",
"path": "macros/sql/surrogate_key.sql",
"original_file_path": "macros/sql/surrogate_key.sql",
"name": "default__surrogate_key",
"macro_sql": "\n\n{%- macro default__surrogate_key(field_list) -%}\n\n{%- if varargs|length >= 1 or field_list is string %}\n\n{%- set error_message = '\nWarning: the `surrogate_key` macro now takes a single list argument instead of \\\nmultiple string arguments. Support for multiple string arguments will be \\\ndeprecated in a future release of dbt-utils. The {}.{} model triggered this warning. \\\n'.format(model.package_name, model.name) -%}\n\n{%- do exceptions.warn(error_message) -%}\n\n{# first argument is not included in varargs, so add first element to field_list_xf #}\n{%- set field_list_xf = [field_list] -%}\n\n{%- for field in varargs %}\n{%- set _ = field_list_xf.append(field) -%}\n{%- endfor -%}\n\n{%- else -%}\n\n{# if using list, just set field_list_xf as field_list #}\n{%- set field_list_xf = field_list -%}\n\n{%- endif -%}\n\n\n{%- set fields = [] -%}\n\n{%- for field in field_list_xf -%}\n\n {%- set _ = fields.append(\n \"coalesce(cast(\" ~ field ~ \" as \" ~ type_string() ~ \"), '')\"\n ) -%}\n\n {%- if not loop.last %}\n {%- set _ = fields.append(\"'-'\") -%}\n {%- endif -%}\n\n{%- endfor -%}\n\n{{ hash(concat(fields)) }}\n\n{%- endmacro -%}",
"resource_type": "macro",
"tags": [],
"depends_on": {
"macros": [
"macro.dbt_utils.type_string",
"macro.dbt_utils.hash",
"macro.dbt_utils.concat"
]
},
FYI @dbeatty10 @joellabes - this is an issue that would have been prevented by calling the macro explicitly as dbt.concat
, instead of just concat
, within dbt_utils.surrogate_key
. I know we've seen other issues with the former approach, though: #5720.
I'm going to close this specific issue as resolved for now.
Oh boy, I had no idea that dbt.something()
and something()
would give different results! I thought it was purely down to clarity or not.
Personally I prefer {{ dbt.something() }}
because it makes it more obvious where the magic is coming from (and I just had an issue where someone else was confused by this: https://github.com/dbt-labs/dbt-utils/issues/666). I think you and Doug decided the opposite in my absence and I was OK with that when I thought it didn't change anything.
Now that it does... 🤔 which failure case is more common/difficult to work around? This or the one raised in 5720?
Oh boy, I had no idea that dbt.something() and something() would give different results! I thought it was purely down to clarity or not.
@joellabes To be clear, they yield different results only if you've defined a macro in your own project named {{ concat() }}
, which then wins the race for global macro resolution. Whereas, if you've defined a macro in your project named {{ default__concat() }}
, that wins the race for dispatch in the dbt
namespace. I think the latter is preferable, because it's a bit harder to find yourself there accidentally, and the dependency shows up in depends_on.macros
.
So, I also find myself leaning toward explicit namespaces... I don't remember exactly why @dbeatty10 and I opted for prefixless, but knowing us, there's definitely a thread somewhere.
The bug in https://github.com/dbt-labs/dbt-core/issues/5720 is decidedly a bug, not just a point of user confusion — it's something we can try to fix. It's a gnarly bug, though. We can invest more time in trying to resolve it, which would clear the way toward us being able to recommend explicit namespaces unambiguously, but it won't be fixed by dbt-core v1.3 + dbt-utils v1.0. At the same time, the issue will just go away when we remove the deprecated back-compat macros from dbt_utils
.
Wholeheartedly agree with using explicit namespaces IFF we can't avoid bugs otherwise! i.e., {{ dbt.type_string() }}
rather than {{ type_string() }}
@jtcohen6 our original discussion was basically this:
- Include dbt. prefix for macros? Or drop?
- The result feels more readable without the prefix to me.
I prefer without the prefix, too.
An underlying assumption for me was that with/without prefix was equivalent. We should be explicit if we will have bugs otherwise. We will need to add dbt.
prefixes to all the code examples here.
To fully unpack my ordered preferences for syntax (if we could create our ideal world):
select concat(['d', 'b', 't'])
select {{ concat(['d', 'b', 't']) }}
select {{ dbt.concat(['d', 'b', 't']) }}
In the first instance, all the Jinja has melted away as-if concat
were a SQL standard function implemented by all databases (or if there existed some kind of black magic that made it appear that way 😉 ).
@dbeatty10 I think I may have a fix for the bug that crops up when users explicitly specify dbt.*
: https://github.com/dbt-labs/dbt-core/pull/5907
Given that, I'm inclined to encourage explicit dbt.*
in user projects and packages, to avoid unintended namespace collisions. (We're not the only people who have this problem: https://stackoverflow.com/questions/70458086/how-to-correctly-import-pyspark-sql-functions)
We will need to add
dbt.
prefixes to all the code examples here.
✅ We can open an issue for this.
@joellabes I imagine we'd also want to update the references in dbt-utils
, which might be a much bigger lift, and warrants opening a separate issue.
We will need to add
dbt.
prefixes to all the code examples here.✅ We can open an issue for this.
I already have a PR open for this page in our documentation, so I'll add it there:
To fully unpack my ordered preferences for syntax (if we could create our ideal world):
My preferences are actually the exact inverse of yours! If we are being clever, then I want to see where the magic is coming from. Mostly because:
as-if
concat
were a SQL standard function implemented by all databases
is a great greenfield stance to take, but it's already a SQL standard function implemented by a lot of databases, so this would open us up to wild amounts of clashes - it would really only be possible with the dbt proxy server rewriting everything before the warehouse saw it. Which isn't impossible I guess! But certainly a surprise for the new employee when they discover that something is swizzling their functions from under them. Maybe I'm thinking too small? I'm open to the scales falling from my eyes
I think I may have a fix for the bug that crops up when users explicitly specify
dbt.*
: #5907
:infinitypartyparrot:
Is this a new bug in dbt-core?
Current Behavior
compiles to:
Expected Behavior
The behaviour should mimic the previously used :
which compiles too:
Steps To Reproduce
Create an example model containing:
Relevant log output
Which database adapter are you using with dbt?
spark
Additional Context
Databricks