dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
10.78k stars 1.34k forks source link

Asset Groups and Subgroups #11826

Open kuchedav opened 1 year ago

kuchedav commented 1 year ago

What's the use case?

The Dagster Asset graph can become very large. To better manage the workflows it would be useful to be able to define not noly categories but also sub-categories.

Ideas of implementation

@dagster.asset(group_name=["group_name", "sub_group_name"])

Groups could be defined as a list (maybe with N levels).

On a visualization standpoint from the global asset graph.

  1. Subgroups appear normally like groups do
  2. Subgroups could be grouped to one box in the global asset graph.
  3. Subgroups do not appear in the global asset graph but only when the Supergroup has been selected.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

mkleinbort-ic commented 1 year ago

I 100% would love this feature

I use the dagit graph to communicate with my stakeholders, and a good clean UI is extremely valuable to drive trust and adoption

I asked about this in the slack some time ago, this was my mock-up of what I was asking for:

image

mkleinbort-ic commented 1 year ago

Also had the same syntax in mind:

@dagster.asset(group_name=["group_name", "sub_group_name"])
def my_new_super_cool_asset(dep1, dep2, dep3):
    ...

Ps. I'm not sure how easy it would be to solve how to render the UI, in theory this could be very hard without validation of the "group" hierarchy.

Consider

@asset(group_name = ['townspeaople', 'barbers'])
def X():
    ....

@asset(group_name = ['barbers', 'men'])
def Y():
    ....

@asset(group_name = ['men', 'townspeaople'])
def Z():
    ....
mkleinbort-ic commented 1 year ago

I'd accept the limitation that sub-groups MUST be wholly contained within a single group.

kuchedav commented 1 year ago

I agree, the groups should be more similar to a file-system than a tag-system. Hence in your example 'barbers' and 'men' would occur multiple times on different levels of hirarchy instead of being grouped together.

Also considering the case:

@asset(group_name = ['computer', 'mouse'])
def X():
    ....

@asset(group_name = ['mammal', 'mouse'])
def Y():
    ....

This would be two differnt sub-categories 'mouse' and would not be combined to one group.

mkleinbort-ic commented 1 year ago

I agree. It's similar to naming the groups

@asset(group_name = 'computer:mouse')
def X():
    ....

@asset(group_name = 'mammal:mouse'])
def Y():
    ....

Then most of the logic is on the dagit UI.

Do you have any estimate of when this would get done (if it gets done?) - I was close to clobbering together this feature myself, but I don't know any front-end coding.

sryza commented 1 year ago

Hi @mkleinbort-ic - this is a solid idea that we are likely interested in implementing. However, other parts of our roadmap are higher priority right now, so it's unlikely that we'll add this within the next couple months.

mkleinbort-ic commented 7 months ago

Thanks @sryza

For the time being I am writing my assets like this:

@asset(group_name = group_hirearchy(['computer', 'mouse']))
def X():
    ....

@asset(group_name = group_hirearchy(['mammal', 'mouse']))
def Y():
    ....

Where group_hirearchy is an often modified function

def group_hirearchy(hirearchy:list[str])->str|None:
    ...

That I can modify to change the layout of the Dagster UI/groups depending on my work that day.

For example:

def group_hirearchy(hirearchy:list[str])->str|None:

    if 'FEATURE_22' in hirearchy:
        return '__'.join(hirearchy)
    else: 
        return None
IAL32 commented 2 weeks ago

Our company is multi-tenant organization, and the only thing we can do for now is just assigning groups to each tenant's name. We cannot do further subgrouping, which makes it very hard to navigate the UI... The only thing we can do, is relying on filtering with tags.