Open dataGriff opened 1 week ago
I think we should make it explicit in the data contract that no other fields are allowed.
A new property additionalFields
(aligned with OpenAPI's additionalProperties).
additionalFields
is not set -> There can be additional fields or not (default)additionalFields
is true
-> There are likely additional fieldsadditionalFields
is false
-> datacontract test
should fail if there are other fields.models:
orders:
description: One record per order. Includes cancelled and deleted orders.
type: table
additionalFields: false
fields:
order_id:
$ref: '#/definitions/order_id'
required: true
unique: true
primary: true
order_timestamp:
description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
type: timestamp
required: true
example: "2024-09-09T08:30:00Z"
models:
orders:
description: One record per order. Includes cancelled and deleted orders.
type: table
config:
failWithAdditionalFields: false
fields:
order_id:
$ref: '#/definitions/order_id'
required: true
unique: true
primary: true
order_timestamp:
description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
type: timestamp
required: true
example: "2024-09-09T08:30:00Z"
Having a boolean fields with a default of true might not be perfect. Other ideas for better names?
I prefer option 1 as it marries up closer to how you'd handle the same situation with columns (e.g. the property required
true/false).
However I think the default behaviour should be false?
additionalFields
is false
My expectation would be that the contract can be ahead of the implementation, with fields marked as required : false
, but the implementation should never be ahead of the contract? Even allowing that with the additionalFields
option being set to true should come with a warning label 😄
In fact... I am talking myself out of there being any behaviour other than the implementation should simply not expect any additional columns compared to the contract and not even be allowed as an option?
Can the whole thing be handled with what is present in the data contract specification and just the extra logic required in the cli?
There should be a feedback or failure mechanism on the data contract cli test when there are additional fields present in the implementation that are not present in the contract.
Currently you can add fields to the implementaiton of the data product and the data contract cli test will pass.
Adding the failure or feedback mechanism will prevent drift of implementation from the contract. Ideally I think it should fail to force the updating of the contract.
It is worth noting that the data contract itself can evolve with additional optinal fields before the implementation, as these would be non-breaking forward changes, were the implementation can catch-up later. However we should never have extra fields in an implementation that are not present in the contract.
Original slack thread.