datacontract / datacontract-specification

The Data Contract Specification Repository
https://datacontract.com/
MIT License
278 stars 41 forks source link

Add URL schema type explicitly #23

Closed tarys closed 9 months ago

tarys commented 10 months ago

Status Quo

Currently, supported schema types are:

  1. dbt
  2. bigquery
  3. json-schema
  4. sql-ddl
  5. avro
  6. protobuf
  7. custom

Motivation

Real-world schemas could be quite large + single data product might span multiple database tables, Kafka topics, etc. Thus, direct inline inclusion of schema definition can make data contract's YAML-flie quite large and inconvenient to be read by human. It's also true, that that is not a problem in case YAML-file is parsed automatically.

Proposal

Add URL schema type explicitly with validation for specification field:

schema:
  type: schema-url
  specification: https://schemaregistry.mycompany.com/path/to/actual/model/description/and/definition

Alternatives

Currently such result could be achieved via:

schema:
  type: custom
  specification: <string>

However, such approach is missing explicit validation of URL format of specification field and is less comprehensive semantically.

simonharrer commented 10 months ago

Thanks for your suggestion. There are two aspects to it:

  1. We currently would like to deprecate the whole schema section in favor of the models as the contract should contain the types in a highly information rich format. The cli.datacontract.com tool should convert from any schema to the model or vice versa. I'd love your opinion on this issue. Perhaps, we could discuss this in #21
  2. Providing a link to a schema file or schema url is super helpful. We offer the $ref('') annotation to reference a url or file. So in your case, you could simply state the following:
    schema:
    type: avro
    specification: $ref('https://schemaregistry.mycompany.com/path/to/actual/model/description/and/definition')

    The validation of this url is a little bit more complicated, I agree, but this way, we still know the schema type (avro).

tarys commented 9 months ago

Many thanks for a detailed answer.

  1. Will definitely take a look and provide my feedback, thanks for offering!
  2. Didn't know about this option, always happy to learn something new.