datacontract / datacontract-specification

The Data Contract Specification Repository
https://datacontract.com/
MIT License
271 stars 41 forks source link

[Feature Reqest] Open up Model object to arbitrary Specification Extension #86

Closed emirkmo closed 3 months ago

emirkmo commented 3 months ago

I am wondering if the Model and Field objects could be opened up to also support extra fields via specification extension. This also means adding extras_allow in the Pydantic config for the Model field in the CLI. (In fact Field is open in the CLI, just not in the README of the spec?)

Is there a reason why the Model object is only extended using a non-standard way via the config field? Is it to be compatible with dbt? The standard "Specification extension" can be used instead of or in addition to the flexible config field.

Reason: Allow additional metadata about a model that may not fit into the "config" sub-field. Also avoid further boilerplate and nesting for fields that maybe quite valuable within the model object.

Example use case:

One that we use for example is an id field (for both models and fields). Having an explicit id makes it possible to rename models unambiguously. The rename operation is automatically understood because the id has not changed, as opposed to a drop & re-create where the id would change.

I assume the lack of an id field comes from dbt way of working, and from the fact that the name of the model is supposed to be a pseudo-id? But we really want to assign unique ids to models (tables) so that intentions are clear without a human in the loop and we don't have the luxury of re-creating tables corresponding to many terabytes of data just for a rename.

emirkmo commented 3 months ago

I did some digging and the config approach was recently added via the configMap approach. Is there a good reason to not open it up further?

jochenchrist commented 3 months ago

Hi @emirkmo,

Thanks for your suggestion.

I assume the lack of an id field comes from dbt way of working, and from the fact that the name of the model is supposed to be a pseudo-id?

In fact, the specification is actually following OpenAPI / JSON Schema conventions, so no id field here. The key of the model/fields is representing the actual technical name. You could use the title field for the display name.

If you want to use a custom field in the specification, you can do so, but Data Contract CLI would ignore it. For that, the current way - as you identified - is to use the config map.

I am not sure, how we can proceed here. Do you have any specific suggestions?

emirkmo commented 3 months ago

My suggestion would be to simply to open up Field and Model to specification extension, like some other Objects already are.

That would already handle everything.

(Re: Id, we considered using title as ID but it goes against the ideal of being clear and explicit. We are happy to add it as an “Specification extension”. The CLI is already a modular library that is easy to extend :) )

jochenchrist commented 3 months ago

OK, I updated the specification accordingly.