dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.29k stars 1.42k forks source link

Concurrency Limits as a Resource Property #19173

Open Daniel-Vetter-Coverwhale opened 8 months ago

Daniel-Vetter-Coverwhale commented 8 months ago

What's the use case?

I think it would be really awesome to be able to specify a concurrency limit as a property of a resource.

For example, to say this is the resource for my connection to the production database, and at any time I don't want there to be more than say 10 ops using this resource. This can sort of be worked out by using the resource lifecycle hooks, but I think that would result in a spurious run failure, and so I think it would be better if this information were available and used further upstream, either by the coordinator, the run launcher, the auto-materialize daemon, or some other piece that I'm not thinking of.

To that end, having the information available on the resource itself seems really useful. It also allows for the separation of concerns for different authors, so whoever is writing the asset doesn't have to worry about overloading the resource, and the author of the resource doesn't have to worry about having that resource being overloaded.

I think we would want to expose this information in the UI, both the limits themselves on the resources, and also the utilization and utilization attempts. For things that are scheduled that use the resource, Dagster could even proactively show that the resource is likely to be overloaded at the given time, which could lead to staggering the schedules manually, accepting that one of the ops/jobs will be delayed, or scaling the resource and increasing the limits. Reactively I think it would definitely be important to show when a job was delayed due to resourcing.

I thought of this recently as I was trying to set concurrency limits via op tags on some generated sling assets, but the op tags don't appear to be populating on the job for auto-materializing the assets, and I'm not sure if that's because op_tags in general don't get added to job config or if I'm just missing where I should be looking for op_tags on assets or something else.

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

cbini commented 1 week ago

This would be useful at a resource level for sensors/schedules as well. One of our vendors throttles the frequency of connections from a single IP. Their SFTP server has one hostname, but we have multiple instances of the product with their own respective logins, so it's difficult to coordinate which login is allowed to access the server at a given time.