dbt-labs / dbt-jsonschema

Apache License 2.0
118 stars 40 forks source link

Make table-level and column-level `meta` extensible for other tools #68

Open yu-iskw opened 1 year ago

yu-iskw commented 1 year ago

Motivation

Some tools which integrate with dbt defines their own custom schema under the meta property at table-level and column-level. For instance, lightdash enables us to declare metrics like below.

https://docs.lightdash.com/guides/how-to-create-metrics

# schema.yml
version: 2
models:
  - name: "orders"
    description: "A table of all orders."
    columns:
      - name: "status"
        description: "Status of an order: ordered/processed/complete"
      - name: "order_id"
        meta:
          metrics:
            total_order_count:
              type: count_distinct
      - name: "order_value"
        meta:
          metrics:
            total_sales:
              type: sum

We define the meta property just as object. I don't have any good ideas to support such extensibility in JSON schema. But, if we make the JSON schema opened to other tools, that would be awesome.

https://github.com/dbt-labs/dbt-jsonschema/blob/main/schemas/dbt_yml_files.json#L680-L682

joellabes commented 1 year ago

Agreed! I don't think that JSON Schema itself is extensible in that way, but if there is any way to augment a schema with components from another schema then that would be awesome.

yu-iskw commented 4 months ago

@joellabes I got an idea to support other tools which define their custom meta by utilizing $ref to include external files and anyOf to unify them. We can independently define custom meta for each tool and include them in dbt_yml_files-latest.json.

Overview

To enhance interoperability between dbt and tools like Lightdash, I propose extending the JSON schemas used in dbt to support external schema references. This approach will allow seamless integration and customization of metadata at various levels, facilitating better data management and analysis.

Proposal Details

1. Utilizing $ref in JSON Schemas:

Use the $ref feature to include external schemas, enabling separate management of schemas while maintaining a unified structure. Merging Definitions with anyOf:

2. Use anyOf to combine multiple definitions for a property, allowing different tools to extend the schema as needed.

Support for Various Resource Types:

3. dbt supports various resource types, including models, sources, and snapshots. This proposal extends metadata support at both table and column levels for each resource type.

Resource types to be supported:

Examples of External Schemas for Lightdash:

JSON Schema Example

{
  "models": {
    "type": "array",
    "items": {
      "type": "object",
      "required": ["name"],
      "properties": {
        "name": { "type": "string" },
        "description": { "type": "string" },
        "access": {
          "type": "string",
          "enum": ["private", "protected", "public"]
        },
        "columns": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/column_properties"
          }
        },
        "config": { "$ref": "#/$defs/model_configs" },
        "constraints": { "$ref": "#/$defs/constraints" },
        "data_tests": {
          "type": "array",
          "items": { "$ref": "#/$defs/data_tests" }
        },
        "deprecation_date": { "type": "string" },
        "docs": { "$ref": "#/$defs/docs_config" },
        "group": { "$ref": "#/$defs/group" },
        "latest_version": { "type": "number" },
        "meta": {
          "anyOf": [
            { "type": "object" },
            { "$ref": "./meta/lightdash/model_table_meta.json" }
          ]
        },
        "tests": {
          "type": "array",
          "items": { "$ref": "#/$defs/data_tests" }
        },
        "versions": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["v"],
            "properties": {
              "columns": {
                "type": "array",
                "items": {
                  "anyOf": [
                    { "$ref": "#/$defs/include_exclude" },
                    { "$ref": "#/$defs/column_properties" },
                    { "$ref": "./meta/lightdash/model_column_meta.json" }
                  ]
                }
              },
              "config": { "$ref": "#/$defs/model_configs" },
              "v": { "type": "number" }
            }
          }
        }
      },
      "additionalProperties": false
    }
  }
}