fivetran / dbt_qualtrics_source

Fivetran's Qualtrics Source dbt package
https://fivetran.github.io/dbt_qualtrics_source/
Apache License 2.0
2 stars 2 forks source link

[Bug] Failure in test dbt_utils_unique_combination_of_columns_stg_qualtrics__distribution_contact_contact_id__distribution_id__source_relation (models/stg_qualtrics.yml) #7

Closed dmcmtntp closed 3 weeks ago

dmcmtntp commented 1 month ago

Is there an existing issue for this?

Describe the issue

The Unique Rows test is failing for stg_qualtrics__distribution_contact https://github.com/fivetran/dbt_qualtrics_source/blob/36dfc658cee91975d66e38f33c6ad8dc6fecba1b/models/stg_qualtrics.yml#L559-L566

I'm not sure if this is due to an issue with the definition of the test, the stg_qualtrics__distribution_contact model, or the qualtrics connector.

Relevant error log or model output

21:58:09  Running with dbt=1.8.0
21:58:09  Registered adapter: snowflake=1.8.0
21:58:10  Found 203 models, 37 snapshots, 1 seed, 339 data tests, 99 sources, 707 macros
21:58:10  
21:58:11  Concurrency: 10 threads (target='dev')
21:58:11  
21:58:11  1 of 1 START test dbt_utils_unique_combination_of_columns_stg_qualtrics__distribution_contact_contact_id__distribution_id__source_relation  [RUN]
21:58:11  1 of 1 FAIL 1315 dbt_utils_unique_combination_of_columns_stg_qualtrics__distribution_contact_contact_id__distribution_id__source_relation  [FAIL 1315 in 0.43s]
21:58:11  
21:58:11  Finished running 1 test in 0 hours 0 minutes and 1.27 seconds (1.27s).
21:58:11  
21:58:11  Completed with 1 error and 0 warnings:
21:58:11  
21:58:11  Failure in test dbt_utils_unique_combination_of_columns_stg_qualtrics__distribution_contact_contact_id__distribution_id__source_relation (models/stg_qualtrics.yml)
21:58:11    Got 1315 results, configured to fail if != 0
21:58:11  
21:58:11    compiled code at target/compiled/qualtrics_source/models/stg_qualtrics.yml/dbt_utils_unique_combination_o_c192fcc0522c3acc88dfab811ca69771.sql
21:58:11  
21:58:11  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected behavior

That the models and tests would run without failing, since they are running on the raw data from the fivetran connector.

dbt Project configurations

models: qualtrics: +schema: warehouse +tags: warehouse qualtrics_source: +schema: staging +tags: staging

vars: qualtrics_database: fivetran_db qualtrics_schema: qualtrics_v2 qualtrics__directory_contact_pass_through_columns:

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

1.8.0

Additional Context

The test is looking for unique combinations of the fields contact_id, distribution_id, and source_relation in the stg_qualtrics__directory_contact model. One thing that jumped out to me when I looked at the compiled SQL for this model is that there doesn't seem to be any deduplication happening in the model.

Here is an extract of the rows in stg_qualtrics__directory_contact which are failing the test: stg_qualtrics__distribution_contact - failed rows.csv

This extract was generated with the following SQL:

with validation_errors as (

    select
        contact_id, distribution_id, source_relation
    from stg_qualtrics__distribution_contact
    group by contact_id, distribution_id, source_relation
    having count(*) > 1

)

SELECT * EXCLUDE (survey_link)
,hash(survey_link) as survey_link_hashed
FROM stg_qualtrics__distribution_contact
WHERE contact_id||distribution_id||source_relation IN (SELECT contact_id||distribution_id||source_relation FROM validation_errors )
ORDER BY distribution_id, contact_id

__

Are you willing to open a PR to help address this issue?

fivetran-reneeli commented 1 month ago

Thanks @dmcmtntp for raising this! I took a look at our connector notes and it seems that adding contact_lookup_id to the unique test should do the trick. I'll loop my team in and work on getting a fix out in the coming weeks.

fivetran-reneeli commented 1 month ago

Also, saw that you're opening up a PR! If you wish to continue, it will be adding contact_lookup_id to the unique test here.

dmcmtntp commented 1 month ago

Sure thing I can open a pull request with that change. Quick question though - do you have any understanding of what the difference is between contact_id vs. contact_lookup_id?

dmcmtntp commented 1 month ago

Opened pull request here: https://github.com/fivetran/dbt_qualtrics_source/pull/8

fivetran-reneeli commented 1 month ago

Thanks @dmcmtntp!

Based on the Qualtrics docs, it looks like the below is the difference: Contact Id: The ID for the contact on a directory level, regardless of distribution list. Lookup Id: The ID of a specific contact in a specific list. It can be a unique identifier that is used to look up a contact across different systems or within a specific organization. It might be used for other integrations where a different ID is used to track the same person.

So the contact ID is probably 1 to 1 with a user, while many lookup IDs can belong to a single user.

I imagine only taking contact_id + distribution_id as the unique parameter is not enough in the distribution_contact table if there's several of those records for different lookup IDs

fivetran-avinash commented 1 month ago

Hi @dmcmtntp ! I'll be taking on merging your changes into a new release of the dbt_qualtrics_source package. I did a quick review and have a few notes you can view here; once those are addressed, this should be good to be merged in and we will kick off our release process!

Let me know if you have any questions!

fivetran-avinash commented 1 month ago

Hey @dmcmtntp ! Hope you were able to get a chance to review the PR!

The changes are pretty small, so we will go ahead and merge them into our branch on Tuesday if we don't hear back.

Have a good weekend!

fivetran-avinash commented 3 weeks ago

Hi @dmcmtntp your test should now be live in the latest version of qualtrics_source.

Let us know if you have any questions or issues!

dmcmtntp commented 3 weeks ago

The updates removed the error in my dbt implementation, thank you!