dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.17k stars 1.4k forks source link

Grouping jobs #6146

Open fgiroud opened 2 years ago

fgiroud commented 2 years ago

Use Case

We have or will have a lots of jobs, could be up to 100-200 The UI might look a bit difficult to read to have the list of jobs as a flat list

Ideas of Implementation

The ideal solution would be to allow the slash character (/) in a job name. Then just from the job names, you could construct a tree of jobs where we can expand/collapse certain groups. Another solution would be to introduce a property "group", but the solution purely based on the job name avoid unnecessary config.

Additional Info

We didn't explored splitting things into different repositories yet, that might help, but that should not be a condition to group jobs.


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

yuhan commented 2 years ago

Hi @fgiroud, thanks for opening this issue! I think it’s a cool idea and something we’d like to address in a future release.

In the meantime, exactly as you mentioned, I'd recommend trying splitting jobs into different repositories.

IgnorantWalking commented 1 year ago

Hi, and sorry for recovering this one-year-old issue.

Have been some progress or discussions about adding capabilities in dagit to group and filter jobs and assets? :roll_eyes:

Another idea, complementary to the one exposed in this issue and in the #6162, could be the capability to split one deployment into a sort of "definitions groups". Groups that should be defined as the result of a filter query applied over all the deployment code location contents: filter by tag, by job or asset name, etc; or defined by a split expression: split the deployment contents by the values of a specific tag, for example. Similar to the config we apply to define the queued run coordinator queues, but focused on the GUI representation.

A user in dagit should be able to see all the deployment contents, like nowadays, or select one of the "definitions groups" to see and interact only with the filtered ones. That would allow having a cross-code-location view for jobs and assets; but without requiring any structural change in the repositories itself, keeping the code-locations structure separated from the view grouped representation in dagit.

Adding the capability to apply different RBAC policies over these "definitions groups" inside a deployment, would be a terrific functionality also, of course.

A real scenario that justifies the kind of flexibility in the view we're talking about, could be the capability to split the deployment contents by use case. In a data-oriented company, we usually have data pipelines focused on ingestion, but also data pipelines focused on analytics or reporting for different stakeholders and business verticals. All of them usually require merging different technologies in form of assets defined among different code-locations, so the code-location grouped view is related to the technical implementation, but not related to the business scenario. Using the "definition groups" we could show and classify all of our data jobs and assets by the use-case they resolve (tagging them by business vertical, department, or use-case name), instead of the technical environment they need for execution.

In the same way, in a complex multi-tenancy scenario, when we have the same set of assets created for different customers; the ability to group them in terms of these "definition groups" without any impact on the execution infrastructure (no extra agents required, no extra grpc servers also, etc.) would be an incredibly useful functionality.

It would be fantastic to see some progress in this matter or to know if some discussions are ongoing.

Thank you very much! :bowing_man: