flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.17k stars 550 forks source link

Change Flyte CR naming scheme to better support namespace_mapping #5480

Open ddl-ebrown opened 2 weeks ago

ddl-ebrown commented 2 weeks ago

@ddl-rliu did most of the work on this one - making this an upstream PR as it resolved a real issue for us.

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

This is deployed in a live Flyte setup where we have automated tests. We observed that the CR names were correctly unique after this and the initial collision no longer occurred.

Setup process

Screenshots

Check all the applicable boxes

Related PRs

Docs link

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 61.00%. Comparing base (bba8c11) to head (d994304). Report is 2 commits behind head on master.

Files Patch % Lines
flyteadmin/pkg/workflowengine/impl/k8s_executor.go 85.71% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #5480 +/- ## ======================================= Coverage 60.99% 61.00% ======================================= Files 793 793 Lines 51325 51366 +41 ======================================= + Hits 31305 31334 +29 - Misses 17136 17146 +10 - Partials 2884 2886 +2 ``` | [Flag](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | Coverage Δ | | |---|---|---| | [unittests-datacatalog](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `69.31% <ø> (ø)` | | | [unittests-flyteadmin](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `58.73% <85.71%> (+0.02%)` | :arrow_up: | | [unittests-flytecopilot](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `17.79% <ø> (ø)` | | | [unittests-flytectl](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `67.97% <ø> (ø)` | | | [unittests-flyteidl](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `79.04% <ø> (ø)` | | | [unittests-flyteplugins](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `61.84% <ø> (+0.02%)` | :arrow_up: | | [unittests-flytepropeller](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `57.32% <100.00%> (-0.01%)` | :arrow_down: | | [unittests-flytestdlib](https://app.codecov.io/gh/flyteorg/flyte/pull/5480/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg) | `65.80% <ø> (-0.03%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flyteorg#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kumare3 commented 3 days ago

I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy

ddl-ebrown commented 3 days ago

I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy

Ah thanks @kumare3 for the heads up! We clearly didn't realize there was something internal to Flyte that depends on deterministic naming for CRs -- will make some updates taking that into account as well

ddl-ebrown commented 3 days ago

I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy

Ah thanks @kumare3 for the heads up! We clearly didn't realize there was something internal to Flyte that depends on deterministic naming for CRs -- will make some updates taking that into account as well

Also, should mention @kumare3 that if by "leaky" you meant "CR might not be deleted from the cluster", the deletion process is robust because this uses the actual key of the workflow in conjunction with CR labels to perform deletes, rather than the CR name.

If there are dupe CRs for the same workflow though, that's clearly an issue regardless :)

EngHabu commented 3 days ago

@ddl-ebrown I agree with not introducing randomization... specially that the name already starts with a random string :-)

Instead, I would update this call to use something like project-domain-rand(10) and hash that and that becomes the execution name...

I would also make the length of the execution name configurable in flyteadmin. so in your deployment you can make it longer and give you better entropy...