dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.8k stars 1.62k forks source link

[CT-3049] [Bug] Unable to override default materialization for python models (dbt >1.4)/Hierarchical Configuration Not Applied #8520

Open devmessias opened 1 year ago

devmessias commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

When working within my company's workflow, we outline all configurations, metadata, etc. in the schema.yml file, with broader configurations being described in the dbt_projects.yml. However, the materialization of Python models cannot be controlled through these files. This requires us to set the configuration within the model itself, as demonstrated below:

def model(dbt, session):
    dbt.config(materialization='incremental')
    ...

If this isn't done, the materialization always defaults to 'table' in the generated manifest.json.

commit that introduces this behavior: https://github.com/dbt-labs/dbt-core/blame/48d04e81417195393af5af1f78ef695f3398f193/core/dbt/parser/base.py#L217

Expected Behavior

I would expect that the materialization of Python models could be controlled through these configuration files, just as it is observed with SQL models. Ideally, without having to specify the materialization within the model itself. When defining configurations in these files, they should be respected, and the materialization shouldn't default to 'table' in the generated manifest.json unless explicitly set to do so. The current behavior introduces an inconsistency in the way dbt models are configured and managed.

Steps To Reproduce

  1. Ensure you are using dbt version 1.4 or newer.

  2. Create a Python model with a materialization setting other than 'table' specified either in dbt_projects.yml or schema.yml.

  3. Execute dbt run. Observe in the terminal that the materialization is displayed as 'table'.

  4. Upon inspecting the manifest.json, you'll find that all other configurations are respected, except for the materialization which defaults to 'table'.

Relevant log output

No response

Environment

- OS: Ubuntu 20.04.6 LTS
- Python: 3.10.9
- dbt: (tested with multiple versions: 1.4, 1.5, and 1.6.1)

Which database adapter are you using with dbt?

spark, other (mention it in "Additional Context")

Additional Context

database adatpter: dbt-databricks

commit that introduces this behavior: https://github.com/dbt-labs/dbt-core/blame/48d04e81417195393af5af1f78ef695f3398f193/core/dbt/parser/base.py#L217

I've some workarounds to fix this. If this issue is accepted I can send a PR.

dbeatty10 commented 1 year ago

Thanks for reporting this @devmessias !

Using the dbt project here as a starting point, I was able to reproduce what you described.

Namely, the manifest showed "materialized": "incremental" when using dbt 1.3.x, but showed table for dbt 1.4.x through 1.6.x.

Here's the customized YAML file that I used:

models/_models.yml

version: 2

models:
  - name: transactions
    config:
      materialized: incremental

Although I didn't try it out for dbt 1.4 though 1.6, I'm assuming we'd see something similar with the following dbt_project.yml:

version: "1.0.0"
config-version: 2
profile: "sandcastle"

models:
  my_project:
    +materialized: incremental

So I agree that this is a bug, and we'd welcome a PR to fix it.

devmessias commented 1 year ago

Hi @dbeatty10 , thank you for the prompt reply. I'll send PR latter with a proposed solution

devmessias commented 1 year ago

PR here https://github.com/dbt-labs/dbt-core/pull/8538 @dbeatty10 . I've tried to create more tests, but I don't know if there is tests that tests hierarchical override in python models.

devmessias commented 11 months ago

Hi @dbeatty10 I'm reaching out regarding the pull request #8538. I understand everyone's likely quite busy to deal with open-source, but I was wondering if there have been any updates or if there is anything more I can do to assist in the review process.

alex-hsp commented 2 weeks ago

I have just unearthed this myself during my debugging through DBT's source code. Basically, DBT sets a default for python model materialization config here, but it takes precedence overy anything defined in schema file (as stated here). With all amazing developents towards python incremental models, i am surprised this was not found during testing

devmessias commented 2 weeks ago

Hi @alex-hsp, you might want to try one of the workarounds we discussed earlier. I would have updated the PR, but my current role has kept me really busy for quite a while. I'll try to get to it over the weekend, though By the way, I applied for a position at dbt labs for dbt-core—maybe if I land the job, I'll finally have some free time :laughing:

devmessias commented 2 weeks ago

Ok, I was able to update the PR this morning https://github.com/dbt-labs/dbt-core/pull/8538

devmessias commented 1 week ago

Hi @dbeatty10 , when you have a moment, could you please prioritize my PR before it gets too outdated again avoiding a new reabse? Thanks a lot!