Move tag-based concurrency management into clients

abrookins commented 3 months ago

Move tag-based concurrency handling client-side, implemented with global concurrency limits. This fixes #14360 and forms part of our larger effort to move all elements of task orchestration client-side.

Limitations and future work:

This changes the behavior of task runs waiting for a concurrency slot. Runs transition to Running before they acquire a slot. As future work, we could make runs that use tag-based concurrency transition to a named Running state, such as Running["AcquiringSlot"], and then transition to the normal Running state after acquiring a slot.
Task run concurrency limits can report which task runs are using the limits, but global concurrency limits do not report the entity using a limit. In future work, users will be able to see which task runs and flow runs are using global concurrency limits.

Example

This PR changes tag-based task concurrency to use global concurrency limits. When a global concurrency limit exists whose name matches a tag in your task, we will apply that limit to the task when it runs. If you want to create a limit to match a tag on your task, you should create a global concurrency limit, not a task run concurrency limit. Future work will likely consolidate these concepts.

Checklist

[ ] This pull request includes a label categorizing the change e.g. maintenance, fix, feature, enhancement, docs.
[ ] This pull request references any related issue by including "closes <link to issue>"
- If no issue exists and your change is not a small fix, please create an issue first.
[ ] If this pull request adds new functionality, it includes unit tests that cover the changes
[ ] If this pull request removes docs files, it includes redirect settings in mint.json.
[ ] If this pull request adds functions or classes, it includes helpful docstrings.

zhen0 commented 2 months ago

@abrookins - doing a bit of maintenance as we have a lot of potentially stale PRs. Is this one that needs action? Or can it be closed?

abrookins commented 2 months ago

Still working on this one! 👍

abrookins commented 2 months ago

@zangell44 Good questions! We may need to expand the concurrency v2 API with more functionality. I'm thinking about this now. I didn't understand question #4.

codspeed-hq[bot] commented 2 months ago

CodSpeed Performance Report

Merging #14382 will not alter performance

_{Comparing global-concurrency-tags (69e8665) with main (0d23f58)}

Summary

✅ 5 untouched benchmarks

zangell44 commented 2 months ago

I think the create_if_missing kwarg + functionality resolves questions 2 and 4. I do think 1 and 3 are still worthy of consideration.

1.) The concurrency v2 api does not track which object took a given slot. If the client crashes mid-run, we have no way of recovering a slot automatically. 3.) 3.x and 2.x task run limits will not be compatible with one another

3 may not have a solution outside of documenting the behavior.

abrookins commented 2 months ago

@zangell44 For 1), I think global concurrency limits should be able to tell you who or what is using them. I plan to add an API endpoint in a follow-up PR that looks at limit acquired and limit released events within a time window to flow or task runs currently using the limit.

For 4), the story should be a little simpler now that client-side concurrency limits will ship with client-side orchestration. That allows us to dump the version-checking code server-side because clients will only be using this new concurrency approach when they use client-side orchestration.

However, the fact remains that if you use client-side orchestration with a task whose tags you had previously created limits for, you would currently need to recreate the limits as global concurrency limits. I haven't spent much time thinking through how to smooth this for users.

PrefectHQ / prefect