datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
423 stars 84 forks source link

Fail CLI Test On Unknown Additional Fields in Implementation Not Present in the Contract #403

Open dataGriff opened 1 week ago

dataGriff commented 1 week ago

There should be a feedback or failure mechanism on the data contract cli test when there are additional fields present in the implementation that are not present in the contract.

Currently you can add fields to the implementaiton of the data product and the data contract cli test will pass.

Adding the failure or feedback mechanism will prevent drift of implementation from the contract. Ideally I think it should fail to force the updating of the contract.

It is worth noting that the data contract itself can evolve with additional optinal fields before the implementation, as these would be non-breaking forward changes, were the implementation can catch-up later. However we should never have extra fields in an implementation that are not present in the contract.

Original slack thread.

jochenchrist commented 1 week ago

I think we should make it explicit in the data contract that no other fields are allowed.

Option 1: additionalFields

A new property additionalFields (aligned with OpenAPI's additionalProperties).

models:
  orders:
    description: One record per order. Includes cancelled and deleted orders.
    type: table
    additionalFields: false
    fields:
      order_id:
        $ref: '#/definitions/order_id'
        required: true
        unique: true
        primary: true
      order_timestamp:
        description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
        type: timestamp
        required: true
        example: "2024-09-09T08:30:00Z"

Option 2: Config Option

models:
  orders:
    description: One record per order. Includes cancelled and deleted orders.
    type: table
    config:
      failWithAdditionalFields: false
    fields:
      order_id:
        $ref: '#/definitions/order_id'
        required: true
        unique: true
        primary: true
      order_timestamp:
        description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
        type: timestamp
        required: true
        example: "2024-09-09T08:30:00Z"

Option 3: Other names

Having a boolean fields with a default of true might not be perfect. Other ideas for better names?

dataGriff commented 1 week ago

I prefer option 1 as it marries up closer to how you'd handle the same situation with columns (e.g. the property required true/false).

However I think the default behaviour should be false?

additionalFields is false

My expectation would be that the contract can be ahead of the implementation, with fields marked as required : false, but the implementation should never be ahead of the contract? Even allowing that with the additionalFields option being set to true should come with a warning label 😄

In fact... I am talking myself out of there being any behaviour other than the implementation should simply not expect any additional columns compared to the contract and not even be allowed as an option?

Can the whole thing be handled with what is present in the data contract specification and just the extra logic required in the cli?

Proposal