dbt-labs / dbt-common

Apache License 2.0
9 stars 10 forks source link

[Regression] v1.6.0 introduces breaking schema changes in manifest.json #170

Closed psygnoser closed 1 month ago

psygnoser commented 1 month ago

Is this a regression in a recent version of dbt-common?

Current Behavior

When building DBT, the manifest.json now contains new properties:

          "to": null,
          "to_columns": [],

in several places, like for example in nodes.models.constraints and in other nodes. This violates v12 manifest schema as seen here: https://schemas.getdbt.com/dbt/manifest/v12/index.html#nodes_additionalProperties_anyOf_i4_constraints.

Though interestingly in the raw schema file https://schemas.getdbt.com/dbt/manifest/v12.json, those properties are in fact defined. But this wasn't the case up until recently (see here for example). So in other words, there were at least two different schemas for v12 of manifest. And since additional properties are disallowed by schema config, it would follow that these kind of additions, are in fact breaking changes and thus should bump the schema version.

Expected/Previous Behavior

Building DBT produces mainfest.json without the aformentioned properties and complies to schema V12. (Downgrading this package to 1.5.0 produces expected behavior)

Steps To Reproduce

Install dbt-core 1.8.3 and run dbt

Relevant log output

No response

Additional Context

No response

QMalcolm commented 1 month ago

@psygnoser are you encountering an error due to the new fields?

For context, our view currently is that a breaking change is when implementing/consuming systems require a mitigation to continue working with the associated dependency. When altering the schema of manifest.json there are a number of things we consider breaking, primarily:

  1. Adding a required (non-defaulted / non-optional) field
  2. Renaming a field
  3. Deleting a field (because we consider present additional fields an error)
  4. Constricting the allowable types for a field

However, there are changes we currently view as non-breaking:

  1. Adding an optional field
  2. Adding a field with a default value
  3. Expanding the allowable types for a field

The fields you brought up on nodes.models.constraints, to and to_columns, currently fall in what we view as non-breaking changes. Specifically, to is an optional, and to_columns will default to an empty list.

psygnoser commented 1 month ago

Yes I have encountered an error. In essence, if you try to validate the manifest.json with the original V12 JSON schema, it will fail with errors. As mentioned before, since for example the nodes.models.constraints object has the "additionalProperties": false set, even doing those three "non-breaking" changes you've listed, will result in a failed schema validation. So if it fails the schema you've defined, then I don't see how can it not be a breaking change? I mean sure, you can update the schema (between releases ie. out-side of release cycle, without anyone knowing), without bumping the version, but that's kind of changing the rules mid-flight as it were, is it not?

Specifically what happened: I'm using the third-party dbt-artifacts-parser package to handle DBT artifact parsing. That package used pydantic to do schema validation of artifacts, for example manifest.json against a local copy of the V12 schema that was released alongside DBT 1.8, which I believe is a completely legitimate use case. And because this package is an implicit dependency of dbt-core, which allows you package to be automatically bumped, if a minor version changes ("dbt-common>=1.3.0,<2.0"), it mean't that even though In my infra env setup the dbt-core is locked to v1.8.x, it still upgraded your package from 1.5.0 to 1.6.0 when installing dbt-core, which unexpectedly caused havoc.

QMalcolm commented 1 month ago

Hi @psygnoser! Thank you for your response 🙂 Also, I apologize for my delay in getting back to you 😞

I've taken some time to discuss this among the other maintainers. I do sympathize with the problem you are experiencing. It's painful, and frustrating. However, I'd like to re-affirm that we do not consider this a regression on our end, and consider the changes we made as non-breaking.

That said, in my mind, the issue is unfortunately an implementation detail of the package dbt-artifacts-parse. It sounds like that package is hard coding a point in time version of the jsonschema. This is problematic in that a particular jsonschema of a given version might change (in additive, non-breaking ways) as we've documented. Additonally, we note in the linked documentation:

Note: Although jsonschema validation using the schemas in schemas.getdbt.com is not encouraged or formally supported, jsonschema validation should still continue to work once the schemas are updated because they are forward-compatible and can therefore be used to validate previous minor versions of the schema.

Short term solution

A workaround for your particular problem could be to pindbt-common==1.5.0 in your local environment as it sounds like the move from 1.5.0 => 1.6.0 was what highlighted the problem.

Long term solution

An issue should be opened against https://github.com/yu-iskw/dbt-artifacts-parser to either update the jsonschema they are using or to begin automatically pulling down the latest copy of a version.