datacontract / datacontract-specification

The Data Contract Specification Repository
https://datacontract.com/
MIT License
278 stars 41 forks source link

List of field types are limited #45

Closed bonisb closed 8 months ago

bonisb commented 8 months ago

The list of field types are limited to only a specific enum list. Some systems have a wider range of fields supported, some other are limited. A specific example is postgres and uuid data type. To allow this type we'd need to maintain a custom schema version.

Would it be possible to extend the list of general types with a wider range of selection, for example postgres has a bigger range of field types which does not appear in the general type selection: https://www.postgresql.org/docs/current/datatype.html#DATATYPE-TABLE. json, uuid, cidr, money, point, polygon, smallint, time, xml are all commonly used types.

Adding uuid would super useful:

                "type": {
                  "type": "string",
                  "title": "FieldType",
                  "description": "The logical data type of the field.",
                  "enum": [
                    "number",
                    "decimal",
                    "numeric",
                    "int",
                    "integer",
                    "long",
                    "bigint",
                    "float",
                    "double",
                    "string",
                    "text",
                    "varchar",
                    "boolean",
                    "timestamp",
                    "timestamp_tz",
                    "timestamp_ntz",
                    "date",
                    "array",
                    "object",
                    "record",
                    "struct",
                    "uuid",
                    "json",
                    "uuid",
                    "cidr",
                    "money",
                    "point",
                    "polygon",
                    "smallint",
                    "time",
                    "xml",
                    "bytes",
                    "null"
                  ]
                },

An example issue: With text type it's not passing the test: Type Mismatch, Expected Type: text; Actual Type: uuid

bonisb commented 8 months ago

cc @jochenchrist? would be happy to create a PR but this repo limits external contribution: "ERROR: Permission to datacontract/datacontract-specification.git denied to ..."

jochenchrist commented 8 months ago

Thanks for contributing and starting the discussion. However, I am afraid I'll vote against this proposal.

It is a difficult consideration which logical data types we want to support, as we have to find a mapping for all supported server types (not only Postgres, also JSON, BigQuery, Snowflake, Avro, Protobuf, ...) and in both directions.

The logical data types will always be a subset on what is supported by all the different technologies. I would therefore like to keep the data types rather small and specify them via the format.

Specifically, for your proposals:

uuid

type: text
format: uuid

(already supported)

json

A json data type is something to consider, as it is supported by BigQuery, Postgres

Currently, it would be modeled as text

type: text

cidr

I'd vote against, this is very special and can be models as text wtih regex

type: text

money

AFAIK, the money data type is bad practice (as tied to the database locale setting) Moneytary amounts are modeled as numeric and the currency in a separate field.

point

Model as two numbers

polygon

Too special

smallint

integer with minimum/maximum if really needed

time

A type to consider. Difficult with timezone vs local time. Currently, model as text with regex and a description to clarify the timezone.

type: text

xml

Model as text.

type: text
bonisb commented 8 months ago

@jochenchrist

With a postgresql data model, with the following definitions

models:
  consumer:
    description: details
    type: table
    fields:
      id:
        description: Unique identifier for a consumer
        type: text
        format: uuid
        required: true
        unique: true

The test fails like: Screenshot 2024-04-03 at 10 49 28

Error: Type Mismatch, Expected Type: text; Actual Type: uuid

As the postgresql type is uuid, not a text

bonisb commented 8 months ago

In the latest version it's working, so I'm closing the issue