Open jburnich opened 2 years ago
We could potentially add a run_tags
attribute to AssetsDefinitions
. Something we'd need to deal with though: what if two different assets provide different values for the same tag, and those assets are included in the same run?
Thanks @sryza ! Maybe do a check on the tags when defining the job. So if several assets have identical keys with different values, an error is returned. I don't know if this is feasible in practice.
Feedback from Philippe Laflamme: https://dagster.slack.com/archives/C01U954MEER/p1673529429283949
I have an @asset
that queries an endpoint when materializing. This endpoint has a concurrency limit of n, which means I need to only materialize n assets concurrently. I’ve setup the QueuedRunCoordinator with tag_concurrency_limits, say http/limit: n which limits to n. How do I apply this tag to my @asset
? I’ve tried using op_tags but that doesn’t seem to apply it when materializing; and for backfilling, I’ve found that:
--from x --to y --tags n
But at the end of the day, I always want this tag to be applied when materializing this asset. Is this possible?I just want to give my 5 cents.
This is also an issue for me, for the same reason as Phillippe points out. Even if there is a better fix than throwing an error when tags collide, one could start by implementing this with the error, and later fixing the edge case.
Tags seem super neat, and it's a shame we cannot utilize them more in software-defined assets! And I was in general expecting jobs simply to inherit all tags from ops/graphs below. Maybe this should, for the time being, be noted in the documentation, that these two types of tags are not strictly related
A data point here from a user we spoke to about this:
Another user wants to use tags on runs of different assets to send alerts to the responsible team when auto materialize runs fail
- If a run includes multiple assets with different resource requirements, they want to use the maximum
One thing that would be problematic with this is that if you have an asset that requires 5G and another that requires 3G choosing 5G wont necessarily be enough. I would love to have a way to provide a function that is able to merge tags or maybe some way of defining a merge strategy per tag (min, max, sum, any)
I'll call out that if you're looking to use tags specifically for concurrency limits, you can check out op and asset concurrency limits which are available today.
+1
I would like to set tags={"dagster/max_retries": 3, "dagster/retry_strategy": "ALL_STEPS"}
on an asset_graph
.
Hi Dagster ! Is it possible to add
tags
to a job only if an asset is selected when the job is run?I have for example a partitioned job with 2 assets (A -> B). The executions of asset A cannot be done in parallel due to requests on an API, while the executions of asset B can be done in parallel.
So I would like to add a tag so that as soon as asset A is selected, the executions run in series. And as soon as only asset B is selected, the executions are done in parallel. If A and B are selected, the executions are done in series. I then use this tag in the
QueuedRunCoordinator
.There is the possibility to add the tag manually during the backfill but I would like this step to be done automatically according to the selected assets.
Relevant requests: