dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[Feature] populate model['constraints'] even if the contract is not enforced #10219

Open graciegoheen opened 6 months ago

graciegoheen commented 6 months ago

Is this your first time submitting a feature request?

Describe the feature

From @kmarq in https://github.com/dbt-labs/dbt-core/issues/10195:

Recreated some previous testing of constraints. If you define a primary_key constraint on a column

    columns:
      - name: col_1
        data_type: string
        constraints:
          - type: not_null
          - type: primary_key

The constraint shows up in the model.columns

"columns": {
        "col_1": {
            "constraints": [
                {
                    "type": "not_null",
                    "warn_unenforced": true,
                    "warn_unsupported": true
                },

whether the contract enforcement is true or false.

If you define a composite primary_key on the model

models:
  - name: model_a
    config:
      contract:
        enforced: true
    description: description
    columns:
      - name: col_1
        data_type: string
        constraints:
          - type: not_null
          - type: primary_key
      - name: col_2
        data_type: string
        constraints:
          - type: not_null
    constraints:
      - type: primary_key
        columns: [col_1, col_2]

Then in the model.constraints you will see this ONLY if contract enforcement is true:

 "constraints": [
        {
            "columns": [
                "col_1",
                "col_2"
            ],
            "type": "primary_key",
            "warn_unenforced": true,
            "warn_unsupported": true
        }
    ],
 "contract": {
        "alias_types": true,
        "checksum": "65ba38db6e4dd6a88418461c76b805acd2e39a113404a8a9c3d556c020564ea4",
        "enforced": true
    },

compared to false:

"constraints": [],
"contract": {
        "alias_types": true,
        "enforced": false
    },

It looks like we're parsing model-level constraints (populating model['constraints']) only if contract enforcement is enabled.

Acceptance Criteria

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

ChenyuLInx commented 4 months ago

Take a look at the migration path for column level constraints. @MichelleArk Can you add a link?

ChenyuLInx commented 4 months ago

Add a snowplow event to track how often this happens as part of the work here.

MichelleArk commented 4 months ago

To make this backward-compatible, we could:

  1. wrap the validation behind try/except blocks, and emit an event (snowplow as @ChenyuLInx mentioned above) if the validation fails for uncontracted models. Ideally this is not user-facing to start, but will just get us a sense of how pervasive the error may be.
  2. depending on data in (1), figure out a timeline to flip a legacy flag default

or

  1. try/except the validation and raise a warning + don't include in model['constriants'] if invalid contracts
  2. no lifecycle to eventually raise error
kmarq commented 1 month ago

Just wanted to comment on this one. There's been lots of development on adding more constraint support and now custom constraints. However that's all blocked by requiring a contract even though many target systems will allow adding constraints through later alter statements (usually failing if the constraint isn't valid for existing data). It would still be really great to be able to apply these constraints outside a contract.

I'd suggest push it to the adapters to determine if they can support constraints through an alter vs requiring a contract up front.

dluo-sig commented 1 month ago

Completely agree with this. Constraints and column definitions should be separate. Columns can and already are pulled based on the model .sql when not defined. I shouldn't have to explicitly define 100+ columns just so that I can have a composite primary key on two of them. Yes, I can have a test that validates the constraint instead of having it on the database object, but it's still better to have it as table metadata so that users querying the data have the information they need.