dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.84k stars 1.62k forks source link

[CT-742] No `{{ log() }}` during parsing #5360

Open jtcohen6 opened 2 years ago

jtcohen6 commented 2 years ago

Context / prompt: https://github.com/dbt-labs/dbt-core/issues/5358#issuecomment-1152444493

https://github.com/dbt-labs/dbt-core/blob/72b6a80b07b3acc9aa64ae65c5d82191dbb1aa11/core/dbt/context/base.py#L543-L562

dbt shouldn't log when it's parsing model code (rendering Jinja) to capture ref / source / config. Or, it shouldn't do it by default. We could think about an optional argument.

-- macros/run_a_query.sql

{% macro run_a_query() %}

  {{ log("I'm running a query now", info = true) }}
  {% set result = run_query("select 1 as id") %}

  select 1 as id

{% endmacro %}
-- models/my_model.sql

{{ run_a_query() }}

select 1 as id
$ dbt run
14:58:16  Running with dbt=1.2.0-a1
14:58:16  I'm running a query now.  <------ this one is misleading!!!! no query is actually being run
14:58:16  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 455 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
14:58:16
14:58:16  Concurrency: 5 threads (target='dev')
14:58:16
14:58:16  1 of 1 START view model dbt_jcohen.my_model .................................... [RUN]
14:58:16  I'm running a query now
jtcohen6 commented 2 years ago

I was able to do a naive version of this by reimplementing (= copy-pasting) the log() context method from the BaseContext into the ProviderContext, since the latter knows how to differentiate between parse time vs. execute time:

    def log(msg: str, info: bool = False) -> str:
        """Logs a line to either the log file or stdout.

        :param msg: The message to log
        :param info: If `False`, write to the log file. If `True`, write to
            both the log file and stdout.

        > macros/my_log_macro.sql

            {% macro some_macro(arg1, arg2) %}
              {{ log("Running some_macro: " ~ arg1 ~ ", " ~ arg2) }}
            {% endmacro %}"
        """
        if not self.provider.execute:
            return ""
        if info:
            fire_event(MacroEventInfo(msg=msg))
        else:
            fire_event(MacroEventDebug(msg=msg))
        return ""

Voila:

$ dbt run
15:22:48  Running with dbt=1.2.0-a1
15:22:48  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 415 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
15:22:48
15:22:49  Concurrency: 5 threads (target='dev')
15:22:49
15:22:49  1 of 1 START view model dbt_jcohen.my_model .................................... [RUN]
15:22:49  I'm running a query now
leahwicz commented 2 years ago

@gshank what do you think about @jtcohen6's implementation here? Is there a better way to do this or is this ok to proceed with?

jtcohen6 commented 2 years ago

The limitation here is that log is defined in the base context, which does not yet know about execute (established in ProviderContext). There might be a way to keep all the logic in base.py, if there's a dynamic way to determine whether the parent class is BaseContext or ProviderContext, or an extra argument that can be passed.

If the approach above is the one we want to go with (because it's simplest/easiest), I think this could be a good first issue for a community contributor. I just don't want to tag it as such, if it's a bad implementation that we wouldn't merge for code-smell reasons :)

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

gshank commented 1 year ago

The way that we normally do parse vs runtime jinja rendering differences is that we provide different versions of methods using different "providers". That method would work for this too.

raulgonzalezduran commented 1 year ago

Any intention to work on this? It is very confusing to have logs in parsing

gabe-trejo commented 6 months ago

also interested in a solution to this problem, the noise definitely creates confusion