dbeatty10 / dbt-mysql

dbt-mysql contains all of the code enabling dbt to work with MySQL and MariaDB
Apache License 2.0
75 stars 53 forks source link

Bug in macro incremental_delete when using composite keys for unique_key #143

Closed lpezet closed 1 year ago

lpezet commented 1 year ago

Describe the bug

When using array of columns for unique_key in incremental materialization, the macro incremental_delete chokes and the following error is thrown, using MySQL 8. I'm using code from main branch.

Example of model:

{{ config(materialized="incremental", unique_key=["actor_id", "film_id"]) }}
select * from {{ source('raw', 'seed') }}

Example of seed:

actor_id,film_id,some_date
1,1,1981-05-20T06:46:51
2,1,1978-09-03T18:10:33
3,1,1982-03-11T03:59:51

Exception thrown when running dbt run:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '['actor_id', 'film_id']) in (
ERROR    configured_std_out:functions.py:236 20:04:01    1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '['actor_id', 'film_id']) in (

Steps To Reproduce

git clone https://github.com/lpezet/dbt-mysql.git
git checkout fix-incremental-composite-keys
# get setup for integration tests
docker-compose up -d
PYTHONPATH=. pytest -v --profile mysql tests/functional/adapter/test_basic.py::TestIncrementalCompositeKeysMySQL::test_incremental_with_composite_keys
docker-compose down

Expected behavior

Exception not thrown and rows from seed deleted and then re-inserted without creating duplicates.

The operating system you're using: Linux GoldenHind 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

The output of python --version: Python 3.8.10

MySQL Server 8.0.29

Additional context

The branch fix-incremental-composite-keys actually contains the test to ,well, test incremental materialization with composite keys. I couldn't find the equivalent in dbt-tests-adapter code...strangely...I must have missed something. I therefore created the class tests/functional/adapter/incremental_composite.py based on dbt-tests-adapter's very own BaseIncremental class and added a test in tests/functional/adapter/test_basic.py called TestIncrementalCompositeKeysMySQL.

The problem, I believe, is in 2 parts:

I believe the following would fix the issue:

{% macro incremental_delete(tmp_relation, target_relation, unique_key=none, statement_name="pre_main") %}
    {%- if unique_key is not none -%}
    delete
    from {{ target_relation }}
    where ({{ unique_key | join(',') }}) in (
        select {{ unique_key | join(',') }}
        from {{ tmp_relation }}
    )
    {%- endif %}

{%- endmacro %}

NB: No *GPT has been harmed, or even used!, during the creation of all this. Just lots of time spent figuring out dbt-tests-adapter and dbt-mysql, git-ing, coding, and typing this very long ticket. Just like old (?) times ;)

moszutij commented 1 year ago

Perhaps the following implementation for backwards compatibility:

{% macro incremental_delete(tmp_relation, target_relation, unique_key=none, statement_name="pre_main") %}
    {%- if unique_key is not none -%}
    delete
    from {{ target_relation }}
    where ({{ unique_key if unique_key is string else unique_key | join(',') }}) in (
        select {{ unique_key if unique_key is string else unique_key | join(',') }}
        from {{ tmp_relation }}
    )
    {%- endif %}

{%- endmacro %}
lpezet commented 1 year ago

@moszutij Your solution is definitely better. I ran my "solution" (let's call it a quick hack) before and all tests pass. That means even the standard dbt-tests-adapter tests pass. I'll add some tests to make sure my hack would fail those. It is surprising those cases are not tested by dbt-tests-adapter (I'm referring to this: https://github.com/dbt-labs/dbt-tests-adapter/blob/main/dbt/tests/adapter/basic/test_incremental.py). Maybe I missed something or I'm looking at the wrong thing?

moszutij commented 1 year ago

After examining your solution, I decided to give it a try, but later realised that it only worked for a list of column names enclosed in single quotes, and not for a string representation. However, the documentation suggests that both formats are acceptable, hence the suggested implementation. I haven't had the chance to familiarise myself with the standard dbt-tests-adapter tests yet, but it's on my to-do list 👩‍💻. I'm also using dbt with mysql, so appreciate the pull requests.

lpezet commented 1 year ago

@moszutij Thanks for checking it out. I really appreciate. I actually just found those tests in dbt-core itself: https://github.com/dbt-labs/dbt-core/blob/1.2.latest/tests/adapter/dbt/tests/adapter/incremental/test_incremental_unique_id.py Way more thorough than what I came up with. I'll update my branch for PR to leverage that test.

There are even more tests in 1.4.latest (shameless plug for my PR #146) when we get there.

dbeatty10 commented 1 year ago

Thanks for your time and effort into this @lpezet and @moszutij ! 🏆

I'm out of pocket this week, but looking forward to collaborating with you next week to get these merged and released.