dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.3k stars 1.43k forks source link

`log_column_level_metadata` macro raises error when `ref` macro is overridden in dbt adaptor #22358

Open maxfirman opened 4 months ago

maxfirman commented 4 months ago

Dagster version

1.7.8

What's the issue?

This line of code raises the following exception when running against the dbt-dremio adaptor:

Compilation Error in model xxxxxxxx (path/to/xxxxxxxx.sql) macro 'dbt_macro__ref' takes no keyword argument 'package'

in macro log_column_level_metadata (macros/log_column_level_metadata.sql) called by macro run_hooks (macros/materializations/hooks.sql) called by macro materialization_view_dremio (macros/materializations/view/view.sql) called by model xxxxxxxx (path/to/xxxxxxxx.sql)

The problem arises because the ref macro has been overridden in the dbt-dremio adaptor (see here), and the log_column_level_metadata macro calls the ref macro with a "package" keyword argument that doesn't exist in the overridden macro.

What did you expect to happen?

The log_column_level_metadata macro should not call ref using the "package" keyword argument. This method of calling ref is not documented as part of the public api. See the dbt documentation for examples for how ref is expected to be called.

How to reproduce?

Might be hard to reproduce without standing up a Dremio instance.

Deployment type

Dagster Cloud

Deployment details

No response

Additional information

I'm happy to work on a fix for this issue, although I won't be able to make a start for the next week or so.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

rexledesma commented 3 months ago

We have a new way of collecting this column level metadata without the need for the Dagster dbt package. Mind trying it out to see if this resolves your issue?

Instead of using the log_column_level_metadata macro, you can just specify that you want the column level metadata in python instead, by specifying .fetch_column_metadata().

You'll need dagster>=1.7.9 and dagster-dbt>=0.23.9 to try this out.

from dagster import AssetExecutionContext
from dagster_dbt import dbt_assets, DbtCliResource

@dbt_assets(manifest=...)
def my_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream().fetch_column_metadata()
maxfirman commented 3 months ago

Thanks @rexledesma. I'll try this out next week and let you know the outcome.

I've also opened the ticket https://github.com/dremio/dbt-dremio/issues/232 to fix the overridden ref macro in the dbt-dremio adaptor.

maxfirman commented 3 months ago

Hi @rexledesma. I did finally get around to trying the fetch_column_metadata method as suggested. Unfortunately this did return the error below.

I haven't spent much time trying to understand the error, however I suspect that it is probably an issue in the dbt-dremio adaptor.

I think the best path forward would be for me to contribute a fix to the ref macro in the dbt-dremio adaptor in order to allow it to be called with a "package" keyword argument. This should resolve the original error in the log_column_level_metadata without requiring any code changes on the dagster side.

dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "dp_model_dbt_assets":

  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_plan.py", line 282, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 501, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 186, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 91, in _process_asset_results_to_events
    for user_event in user_event_sequence:
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute.py", line 195, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context):
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute.py", line 164, in _yield_compute_results
    for event in iterate_with_context(
  File "/opt/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 466, in iterate_with_context
    with context_fn():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
dbt.exceptions.CaughtMacroErrorWithNodeError: Compilation Error
  'None' has no attribute 'database'

  > in macro get_columns_in_relation (macros/adapters/columns.sql)
  > called by <Unknown>

  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/opt/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 468, in iterate_with_context
    next_output = next(iterator)
  File "/opt/spark/work-dir/dp_dagster/assets/dbt/dp_model.py", line 65, in dp_model_dbt_assets
    yield from dbt.cli(
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 1068, in __next__
    return next(self._inner_iterator)
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 1160, in _threadpool_wrap_map_fn
    yield from imap(
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/utils.py", line 356, in imap
    yield current_work_item.result(timeout=0.1)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 1134, in _map_fn
    result = fn(self._dbt_cli_invocation, event)
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 1111, in <lambda>
    fetch_metadata = lambda invocation, event: _fetch_column_metadata(
  File "/opt/venv/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 938, in _fetch_column_metadata
    cols: List[BaseColumn] = adapter.get_columns_in_relation(relation=relation)
  File "/opt/venv/lib/python3.10/site-packages/dbt/adapters/sql/impl.py", line 154, in get_columns_in_relation
    return self.execute_macro(
  File "/opt/venv/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 1112, in execute_macro
    result = macro_function(**kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 20, in macro
  File "/opt/venv/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 255, in call_macro
    with self.exception_handler():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 304, in exception_handler
    raise CaughtMacroErrorWithNodeError(exc=e, node=self.macro)

The above exception occurred during handling of the following exception:
jinja2.exceptions.UndefinedError: 'None' has no attribute 'database'

  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 302, in exception_handler
    yield
  File "/opt/venv/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 22, in macro
  File "/opt/venv/lib/python3.10/site-packages/jinja2/sandbox.py", line 326, in getattr
    value = getattr(obj, attribute)
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 859, in __getattr__
    return self._fail_with_undefined_error()
  File "/opt/venv/lib/python3.10/site-packages/jinja2/runtime.py", line 852, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)