dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.96k stars 1.63k forks source link

[CT-2016] [Spike] Static artifact for CLI validation #6840

Open jtcohen6 opened 1 year ago

jtcohen6 commented 1 year ago

The ask: A language-agnostic data structure to validate dbt CLI commands, without actually requiring dbt-core to be imported/installed. (Could be a JSONSchema, doesn't have to be.) Let's get away from any need for naïve regex.

For starters, the goal here wouldn't even be to parse CLI strings into meaningful representations — just to say, this is or isn't a valid CLI string. But I'd also imagine wanting to extend this nicer-to-have territory (auto-complete, blocking certain options, extending with additional options).

As I see it, two options:

  1. Abstract the combination of commands + params one step further than we already have, by means of Python methods & decorators, into a static data structure (e.g. JSON). Then, within dbt-core's CLI, consume that data structure to generate the CLI methods/decorators.
  2. Serialize click.cli to a static data structure, which could then be used (itself? by click? by another tool?) just to validate CLI strings.

The closest thing I could find built into click is the to_info_dict method, which is really intended to support auto-generating documentation: https://click.palletsprojects.com/en/8.1.x/api/#click.Command.to_info_dict

>>> from click import Context
>>> from dbt.cli.main import cli
>>> with Context(cli) as ctx:
>>>   info = ctx.to_info_dict()
>>> info['command']['commands'].keys()
dict_keys(['build', 'clean', 'compile', 'debug', 'deps', 'docs', 'init', 'list', 'ls', 'parse', 'run', 'run-operation', 'seed', 'snapshot', 'source', 'test'])
>>> [param['name'] for param in info['command']['commands']['run']['params']]
['defer', 'favor_state', 'exclude', 'fail_fast', 'full_refresh', 'models', 'profile', 'profiles_dir', 'project_dir', 'select', 'selector', 'state', 'target', 'target_path', 'threads', 'vars', 'version_check', 'help']
jtcohen6 commented 1 year ago

FYI to Execution team: I'm going to queue this up for estimation discussion. Not expecting a point estimate (since it's a spike), just expecting that you all know more than I do about this topic & might have some strong opinions!

dbeatty10 commented 1 year ago

Here's a Python script that will output an artifact named dbt-core-cli-flags.json:

generate_cli_flags_artifact.py

import json
from click import Context
from dbt.cli.main import cli

def convert_to_serializable(obj):
    # Convert non-serializable objects to strings
    return str(obj)

def serialize_dict_to_json(input_dict):
    # Use a custom conversion function when serializing
    return json.dumps(input_dict, indent=4, default=convert_to_serializable)

with Context(cli) as ctx:
    info = ctx.to_info_dict()

    pretty_json_string = serialize_dict_to_json(info)

    # Write the JSON string to a file
    with open("dbt-core-cli-flags.json", "w") as file:
        file.write(pretty_json_string)

Usage

python generate_cli_flags_artifact.py

Note: the dictionary can contain content that isn't serializable to JSON (like tuples, functions, etc), so this script just converts those to a string.

Related internal Slack threads

(Added 2024-01-12)

dbeatty10 commented 9 months ago

Here is a script that produces the Markdown table below:

image
dbeatty10 commented 9 months ago

Here's a section of the documentation that outlines which sub-commands are available in dbt Core, dbt Cloud CLI, and dbt Cloud IDE:

image