dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.95k stars 1.63k forks source link

[CT-3495] I want to use spacing to make my csv input/expected output mock data more readable for unit tests #9280

Open graciegoheen opened 11 months ago

graciegoheen commented 11 months ago

Is this a new bug in dbt-core?

Current Behavior

If I want to use format:csv for defining my mock data for a unit test definition, I cannot use any spaces:

unit_tests:
  - name: b
    model: dim_wizards 
    given: 
      - input: ref('stg_wizards')
        format: csv
        rows: |
          wizard_id,email,email_top_level_domain
          1,cool@example.com,example.com
          2,cool@unknown.com,unknown.com
          3,badgmail.com,gmail.com
          4,missingdot@gmailcom,gmail.com
...

This is even true if I want to use a fixture file:

unit_tests:
  - name: c
    model: dim_wizards 
    given: 
      - input: ref('stg_wizards')
        format: csv
        fixture: wizard_emails_input
...
# tests/fixtures/wizard_emails_input

wizard_id,email,email_top_level_domain
1,cool@example.com,example.com
2,cool@unknown.com,unknown.com
3,badgmail.com,gmail.com
4,missingdot@gmailcom,gmail.com

This makes the text harder to read / write.

Expected Behavior

I should be able to use spaces when defining my mock data for a unit test using format: csv:

unit_tests:
  - name: b
    model: dim_wizards 
    given: 
      - input: ref('stg_wizards')
        format: csv
        rows: |
          wizard_id, email, email_top_level_domain
          1, cool@example.com, example.com
          2, cool@unknown.com, unknown.com
          3, badgmail.com, gmail.com
          4, missingdot@gmailcom, gmail.com
...

Spaces should also be allowed in fixture files:

unit_tests:
  - name: c
    model: dim_wizards 
    given: 
      - input: ref('stg_wizards')
        format: csv
        fixture: wizard_emails_input
...
# tests/fixtures/wizard_emails_input

wizard_id, email, email_top_level_domain
1, cool@example.com, example.com
2, cool@unknown.com, unknown.com
3, badgmail.com, gmail.com
4, missingdot@gmailcom, gmail.com
graciegoheen commented 10 months ago

Notes from refinement:

graciegoheen commented 10 months ago

Here's what happens with seeds:

  1. created a new seed in my project with spaces
    column_a, column_b
    1, grace
    2, doug
  2. executed dbt seed Image

The created table in the warehouse -> my_seed

So the fields for column_b included the spaces: grace & doug. This does seem to be consistent with what's happening for unit tests - see example here.