dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.82k stars 1.62k forks source link

Extract the visualization code for the model graph without the docs site #10429

Closed htpaf closed 3 months ago

htpaf commented 3 months ago

Is this your first time submitting a feature request?

Describe the feature

The context is https://github.com/dbt-labs/dbt-core/issues/4009 -> "Visualize the model graph without the docs site"

Something makes the zoomable graph in a webpage possible when runningdbt docs generate and dbt docs serve

I want to be able to run "something" on the command line that supports --select XYZ and get the same graph that the dbt docs serve produces for visualizing the graph network. Especially the layout that can be zoomed into such that a larger graph is still viewable.

Given https://github.com/dbt-labs/dbt-core/issues/4009 I am not sure where the functionality lives. I am assuming that something in dbt-core does this and it is a matter of extracting the existing functionality into a standalone piece. In that sense, this might be an enhancement request.

I tried using the https://docs.getdbt.com/reference/artifacts/other-artifacts#graphgpickle with networkx but the results are poor. The layout-engine used in dbt-core visualization is the part that is most interesting, as that pretty much makes or breaks the usability from my brief experience of trying to re-create the visualization.

Describe alternatives you've considered

I tried using the https://docs.getdbt.com/reference/artifacts/other-artifacts#graphgpickle with networkx but the results are poor. The layout engine used in dbt-core visualization is the part that is most interesting, as that pretty much makes or breaks the usability from my brief experience of trying to re-create the visualization.

I've also looked at hotdag but it did not work out of the box given my version of Python packages and fails with an error.

Who will this benefit?

Anyone developing models and wants a quick look at their graph dependencies. When the project becomes moderately large it is hard to understand how things are connected, but you are often only interested in certain parts, not the entire project.

The functionality is already there but you can only access the functionality needed by using a browser and by clicking around. It would be good to be able to extract parts of the graph and possibly augment the nodes with more information, to be able to send to other people for example.

I, for example, foresee creating a mini version of the graph where it is possible to augment the nodes with on-hover functionality if the extraction is an interactive web page.

Are you interested in contributing this feature?

Would need an explation on how it works

Anything else?

If an explanation is given with enough code/API details I would try to extract and combine the parts needed. As it stands I lack the knowledge of how the graph is constructed and its dependencies.

dbeatty10 commented 3 months ago

Thanks for reaching out @htpaf !

Where the functionality lives

The functionality that provides the zoomable graph visualization is located within the dbt-docs repo (specifically, here I believe).

The dbt-docs repo loads that DAG into a Javascript representation and produces visualizations using Cytoscape.js. Then there is logic within Javascript that allows for visualizing subsets of the DAG like --select XYZ.

Feature request

It sounds like you want to be able to run a CLI command will output the same visualization as included in the docs website launched by dbt docs serve. This comment in #4009 did a good job of explaining that this isn't something we want to add to dbt-core, and I'm going to close this issue for the same reasons.

Building your own

That comment also has some good hints on how you could approach building your own.

The key insights are:

Here's an example command:

$ dbt list -s +my_model --output json --output-keys unique_id depends_on --quiet

The output looks like this:

{"unique_id": "model.my_project.my_model", "depends_on": {"macros": [], "nodes": []}}
{"unique_id": "model.my_project.my_model_2", "depends_on": {"macros": [], "nodes": ["model.my_project.my_model"]}}

From there, you can transform that to whatever graph representation is needed for your preferred graph visualization library.

htpaf commented 3 months ago

I am sorry to hear that. You may say that dbt-core does not want that functionality which is fine. However, I would guess that the primary use of dbt docs is the interactive visual graph and as such is a 'core' functionality for users of dbt. It is only a guess of course. It would be easy enough to remove the graphing functionality and see if many complain or not.

I mentioned that I have tried the suggestion in the referenced issue. It is not a substitute for an interactive graph.

I appreciate the link to the javascript part, I will investigate from there.