dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.83k stars 1.62k forks source link

[CT-3497] I want to add a description/label to each of the rows in my unit test to explicitly call out the edge cases I'm testing for #9283

Open graciegoheen opened 10 months ago

graciegoheen commented 10 months ago

Is this your first time submitting a feature request?

Describe the feature

When creating a unit test in my project:

unit_tests:
  - name: a # this is the unique name of the test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
          - {wizard_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true}
        - {wizard_id: 2, is_valid_email_address: false}
        - {wizard_id: 3, is_valid_email_address: false}
        - {wizard_id: 4, is_valid_email_address: false}

I want to optionally add descriptions/labels to each of my input rows to explain what each of the edge cases are. Something like:

      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
             description: valid email
          - {wizard_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
             description: incorrect email domain
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
             description: no @ symbol
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
             description: no period

More product/dx refinement needed on the spec. We should be able to add descriptions/labels regardless of which format: is used.

Describe alternatives you've considered

I could just put a large block of text in the description: field of the unit test.

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

dbeatty10 commented 10 months ago

Good idea about describing each test case 🤩

Adding a description as additional sub-item of each row might be tricky.

With the features that currently exist, here's several different ways to describe individual test cases (none of which I actually tested to confirm if they work or not):

  1. One description to rule them all
  2. YAML comments
  3. Individual unit tests

How do you feel about the pros/cons of each? (Can't say I had the most fun writing out 3. 😂)

One description to rule them all

unit_tests:
  - name: a # this is the unique name of the test
    description: |
      There are four test cases:
      1. valid email
      2. incorrect email domain
      3. no @ symbol
      4. no period
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          ....

YAML comments

      - input: ref('stg_wizards')
        rows:
          # valid email
          - {wizard_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
          # incorrect email domain
          - {wizard_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
          # no @ symbol
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          # no period
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}

Individual unit tests

unit_tests:

  - name: a_valid_email
    description: valid email
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 1, is_valid_email_address: true}

  - name: a_incorrect_email_domain
    description: incorrect email domain
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 2, is_valid_email_address: false}

  - name: a_no_at_symbol
    description: no @ symbol
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 3, is_valid_email_address: false}

  - name: a_no_period
    description: no period
    model: dim_wizards
    given:
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect:
      rows:
        - {wizard_id: 4, is_valid_email_address: false}
alison985 commented 8 months ago

FWIW, there would be value in printing the description of the test case in the test output to help with debugging. Individual unit tests aren't DRY. YAML comments wouldn't output when running the test.

Of the three above, I like description best. It may also be the easiest thing to add to test output. It also gives space for longer descriptions. It does mean whoever updates test cases has to remember to update the description though.

This isn't a great idea because it depends on implied order which again a test case updater would have to remember to update, but you could do:

unit_tests:
  - name: a # this is the unique name of the test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {wizard_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
          - {wizard_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
          - {wizard_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
          - {wizard_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
        description:
          - "valid email"
          - "incorrect email domain"
          - "no @ symbol"
          - "no period"
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows:
          - {world_id: 1}
    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true}
        - {wizard_id: 2, is_valid_email_address: false}
        - {wizard_id: 3, is_valid_email_address: false}
        - {wizard_id: 4, is_valid_email_address: false}

The following is probably slightly better from a developer user experience standpoint and an avoiding bugs based on implied order standpoint. However, it may be worse if it performs more queries or depending on how the last element here would have to flow. I have no knowledge of unit_tests outside of this thread to be able to guess.

    expect: # the expected output given the inputs above
      rows:
        - {wizard_id: 1, is_valid_email_address: true, 'valid email'}
        - {wizard_id: 2, is_valid_email_address: false, 'incorrect email domain'}
        - {wizard_id: 3, is_valid_email_address: false, 'no @ symbol'}
        - {wizard_id: 4, is_valid_email_address: false, 'no period'}