Open ddl-ebrown opened 2 weeks ago
Attention: Patch coverage is 90.00000%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 61.00%. Comparing base (
bba8c11
) to head (d994304
). Report is 2 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
flyteadmin/pkg/workflowengine/impl/k8s_executor.go | 85.71% | 1 Missing and 1 partial :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy
I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy
Ah thanks @kumare3 for the heads up! We clearly didn't realize there was something internal to Flyte that depends on deterministic naming for CRs -- will make some updates taking that into account as well
I am not in favor of this, as randomness will lead to leaky workflows and duplicates. We should use the project id itself or generate a consistent hash to increase inter project execution entropy
Ah thanks @kumare3 for the heads up! We clearly didn't realize there was something internal to Flyte that depends on deterministic naming for CRs -- will make some updates taking that into account as well
Also, should mention @kumare3 that if by "leaky" you meant "CR might not be deleted from the cluster", the deletion process is robust because this uses the actual key of the workflow in conjunction with CR labels to perform deletes, rather than the CR name.
If there are dupe CRs for the same workflow though, that's clearly an issue regardless :)
@ddl-ebrown I agree with not introducing randomization... specially that the name already starts with a random string :-)
Instead, I would update this call to use something like project-domain-rand(10)
and hash that and that becomes the execution name...
I would also make the length of the execution name configurable in flyteadmin. so in your deployment you can make it longer and give you better entropy...
@ddl-rliu did most of the work on this one - making this an upstream PR as it resolved a real issue for us.
Tracking issue
Why are the changes needed?
Typically Flyte is configured so that each project / domain has its own Kubernetes namespace.
Certain environments may change this behavior by using the Flyteadmin namespace_mapping setting to put all executions in fewer (or a singular) Kubernetes namespace. This is problematic because it can lead to collisions in the naming of the CR that flyteadmin generates.
What changes were proposed in this pull request?
This patch fixes 2 important things to make this work properly inside of Flyte:
it adds a random element to the CR name in Flyte so that the CR is named by the execution + some unique value when created by flyteadmin
Without this change, an execution Foo in project A will prevent an execution Foo in project B from launching, because the name of the CR thats generated in Kubernetes assumes that the namespace the CRs are put into is different for project A and project B
When namespace_mapping is set to a singular value, that assumption is wrong
it makes sure that when flytepropeller cleans up the CR resource that it uses Kubernetes labels to find the correct CR -- so instead of assuming that it can use the execution name, it instead uses the project, domain and execution labels
How was this patch tested?
This is deployed in a live Flyte setup where we have automated tests. We observed that the CR names were correctly unique after this and the initial collision no longer occurred.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link