dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
849 stars 160 forks source link

Provide `--json` option for `run` command #1217

Closed michicodes closed 7 months ago

michicodes commented 3 years ago

We are using dataform inside gcp and want to emit pubsub events based on the outcome of dataform run.

Which *.sqlx files are run is based on some tagging. The only way to figure out which of them were affected would be to parse the text output. It looks something like that:

Dataset created:  some_database.some_view [view]
Dataset created:  some_database.some_table [incremental]
Dataset created:  some_database.some_table [incremental]

Having the possibility to get the json representation here as well would help tremendously. Right now --json is not respected for the run command.

Cheers, Michi

michicodes commented 3 years ago

As I was looking a little closer it seems the --json command is only used to display the compiled graph before actually running it, but not the executed graph of the run result. Do you have plans on adding that?

Ekrekr commented 1 year ago

To save it to a JSON you can pipe the output, e.g. dataform compile --json >> file.json. An output file would be a more elegant solution though!

andres-lowrie commented 1 year ago

being able to get the execution graph form the cli would be really useful

andres-lowrie commented 1 year ago

in case anyone lands on this via search engine. The run command does respect the --json flag check it out over here https://github.com/dataform-co/dataform/blob/main/cli/index.ts#L636-L644

@michicodes not sure if this covers your point though

Ekrekr commented 7 months ago

You're right that the JSON option is allowed for the run command, but it doesn't actually do anything currently unless the dry run is used https://github.com/dataform-co/dataform/blob/c0d1a7400f4aed74f90e032149561a3027df0ea4/cli/index.ts#L513

This makes sense, because otherwise there would be no output about whether tables were created or not by the run. To prevent confusion, I'm instead throwing an error if json is specified without dryRun: https://github.com/dataform-co/dataform/pull/1697.

andres-lowrie commented 7 months ago

You're right that the JSON option is allowed for the run command, but it doesn't actually do anything currently unless the dry run is used

https://github.com/dataform-co/dataform/blob/c0d1a7400f4aed74f90e032149561a3027df0ea4/cli/index.ts#L513

This makes sense, because otherwise there would be no output about whether tables were created or not by the run. To prevent confusion, I'm instead throwing an error if json is specified without dryRun: #1697.

sweet... that will make the ux a bit nicer 😊