Open alexec opened 4 years ago
Not only multi-cluster, shall I create another issue for multi-namespace support? This is a related issue: https://github.com/argoproj/argo/issues/2063#issuecomment-668211852 to install Argo Workflow in one namespace, but support creating pods in multiple namespaces (not Cluster installation, as the permission would be too broad)
More details:
PoC findings:
What went well:
argo cluster add other
Questions raised:
I've created a dev build for people to test out multi-cluster workflows (and therefore prove demand for it)
argoproj/workflow-controller:multic
Instruction for use:
https://github.com/argoproj/argo/blob/399286fc1884bf20419de4b091766b29bbca7d94/docs/multi-cluster.md
Please let me know how you get on with this.
Please answer this poll: https://argoproj.slack.com/archives/C8J6SGN12/p1607041333397500
@alexec what do you think of this? https://admiralty.io/blog/2019/01/17/running-argo-workflows-across-multiple-kubernetes-clusters (the link was once listed on the Argo Workflows website)
The blog post is slightly outdated, as Admiralty uses Virtual-Kubelet and the Scheduler Framework now, but the use case still works. Admiralty creates a node that represents a remote cluster, makes multi-cluster workflows possible without any code change in the Argo project.
IMHO, multi-cluster is a common concern best treated separately. BTW, Admiralty also works with Argo CD.
Hi @adrienjt thank you - I just tweeted the post's author before realizing it was you. I'm aware that any first-class solution in Argo would be in competition with a multi-cluster scheduler as it would make the need moot. I'm also aware from working on Argo CD, that security with multi-cluster is difficult, because you end up with a single main cluster that has a lot of permissions..
I've updated the dev images during my white-space time today. You can test with these images:
alexcollinsintuit/workflow-controller:multic
alexcollinsintuit/argocli:multic
We really need to hear more concrete use case to progress this.
Isn't multi namespace already supported? I assume this could be done by using a cluster scope installation, but instead of creating a cluster role, create roles in each namespace you would like argo to have access to.
We really need to hear more concrete use case to progress this.
@alexec For background on our use case:
We have 4 environments - each are separate clusters.
One is an 'operations' cluster that has argo-workflows installed. The rest are dev, staging, and production.
We have a workflow that updates multiple data stores with a lot of data.
Instead of 3 argo installations / UIs or instead of exposing endpoints to the data stores so they can be accessed by the operations argo workflow- I'd rather be able to run a workflow pod in a different cluster than argo is installed in so I can have one UI/Login with all my workflows that run in multiple clusters.
Right now we have to expose all these data stores and copy over a lot of the k8s secrets from the dev/staging/production clusters to the operations cluster in order for everything to work. I'd rather be able to just run a container in any connected cluster I specify.
@alexec our usecase follows as below
We have a central master cluster which needs to connect to multiple different regional and edge K8 clusters to run different workflows depending on what workflows provisioned in our central master Argo server.
Right now we worked around by using git runners on each regional cluster to run some of our tasks. It is a cumbersome solution difficult to maintain and organize the sequence of tasks.
There are two main interpretations of "multi-cluster":
As this is ambiguous, we don't actually know which of these you want (or both).
Can I ask you to vote by adding the appropriate reaction (π / π ) to this comment. Go further to demonstrate your interest by adding a comment with the use case you're trying to solve.
Our use case for option 2:
Our workflow involves different steps involving different clusters. First few steps are for extracting and preprocessing the data in the first cluster, then the next step is to train the data in a separate cluster (with GPU) for machine learning purposes.
@alexec point 2 might be more extensible in terms of scaling i.e. deploy workflow controllers in different namespaces and/or clusters and they communicate with a single argo server. Might also open the possibility of having workflow controllers outside kubernetes (VM deployments) since we might not want specialized hardware such as GPU machines to be part of a cluster.
Where option 2 might be nice is where there is a secondary cluster for windows nodes. Our primary cluster (linux) uses a CNI not compatible with windows so we had to set up a separate cluster. It would be nice if our argo workflows that is on our primary server had the capability to schedule workloads on the secondary cluster for windows specific tasks.
Imagine in the case someone was using argo workflows for CI and was working in a monorepo for Linux and Windows docker images. Instead of having separate workflows, a single one with tasks that could be scheduled on the correct cluster, could open up a lot of interesting possibilities.
Point 1 is the straightforward use case where you have several clients and cloud accounts with distinct clusters.
Managing workflows (UI+single client) specially Cron ones from one of the Argo installed on a "central" one would simplify a lot the work. It might be several Argos (one on each cluster) but to have an existing main one with an abstraction with credentials over the rest, maybe an easier option than trying to avoid existing Argos in the subrogated clusters
Our use case for option 2:
Our workflow involves different steps involving different clusters. First few steps are for extracting and preprocessing the data in the first cluster, then the next step is to train the data in a separate cluster (with GPU) for machine learning purposes.
From the machine learning perspective, this use case is increasingly popular. At AWS, I meet with many customers who are hybrid or multi cloud. The ability to run steps that transfer data, run a container in different clusters, merge final results, and manage all steps in a single interface is highly valuable.
@srivathsanvc
To add to what's said about clusters having different hardware, we have a use with clusters being different architectures as well. ppc64le
vs x86_64
but could be for any. We need to build packages on both, and publish to a single place.
Due to the nature of these packages using cross-compilers aren't an option, so we maintain two separate openshift clusters with their own argo instances etc. It would be nice to have a Workflow be able to schedule/track across clusters so we know when both sides have finished.
Just a comment, can't the cases where there are different architectures / hardware requirements be achieved with the use of nodeSelector? you don't need to have to separate clusters to support this, you just need these nodes in the same cluster with appropriate labels.
Use cases for multi-cluster workflows that we have observed recently:
A. π Automating a workflow that uses a variety of processing resources (eg, both CPU and GPU at different steps, uses specific AWS, GCP, Azure features at different steps, etc.)
B. π Running and re-running a workflow across separate clusters that hold different client/ customer data
C. π Running an extremely large workflow that requires sharding the workload across multiple clusters in order to complete the job and avoid hitting resource limits
D. π Automating a workflow that executes steps distributed across multiple cloud regions/ data centers. e.g. complying with GDPR-type data restrictions on where data must be stored; running workflows with those datasets spread across clusters in different regions
I'm curious if others are seeing these use cases. Perhaps upvote with the corresponding emoji if so!
Our use case where multi-cluster might simplify architecture: we have a large number of simultaneously running workflows (tens of thousands). We don't fit into one region so we plan to run k8s clusters across two or three regions and here we have a few options how to handle balancing:
If workflow controller can handle it it will simplify greatly cluster topology, though it will make it much more loaded because it must handle all the workflows in one process.
One of the major issues we have with all listed setups is how to handle backpressure, especially when different regions have different capacity.
A multi-cluster PoC is ready for testing.
v0.0.0-dev-mc-0
(it will be published in about 30m)This is an absolutely and madly powerful and great solution
Some notes with latest tag: I had to give access secrets rights to the argo-workflow-controller service account like
kubectl create role access-secrets --verb=get,list,watch,update,create --resource=secrets -n argo
kubectl create rolebinding --role=access-secrets default-to-secrets --serviceaccount=argo:argo-workflow-controller -n argo
previously log showed:
msg="failed to get kubeconfig secret: secrets \"kubeconfig\" is forbidden: User \"system:serviceaccount:argo:argo-workflow-controller\" cannot get resource \"secrets\" in API group \"\ β
β " in the namespace \"argo\""
In kubeconfig secret itself i'm trying to connect to AWS cluster via aws command (not sure it is supposed to work though):
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: aaaaaa==...
server: https://AAAAF.yl4.us-east-2.eks.amazonaws.com
name: arn:aws:eks:us-east-2:XYZ:cluster/test
contexts:
- context:
cluster: arn:aws:eks:us-east-2:XYZ:cluster/test
user: arn:aws:eks:us-east-2:XYZ:cluster/test
name: arn:aws:eks:us-east-2:XYZ:cluster/test
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-east-2:XYZ:cluster/test
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-east-2
- eks
- get-token
- --cluster-name
- test
command: aws
env:
- name: AWS_ACCESS_KEY_ID
value: 121212
- name: AWS_SECRET_ACCESS_KEY
value: 1344444
Error:
time="2021-08-28T09:45:47Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2021-08-28T09:45:47Z" level=info msg="cron config" cronSyncPeriod=10s
time="2021-08-28T09:45:47.882Z" level=info msg="not enabling pprof debug endpoints"
time="2021-08-28T09:45:47.914Z" level=info msg="Get secrets 200"
E0828 09:45:47.915537 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1d49520, 0x2fd6c40)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:48 +0x86
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc00014dc40, 0x22bc568, 0xc0003040a0, 0x22ea6a8, 0xc0005f5340, 0x2283ae0, 0xc00051f4b0, 0xc00014a9e0, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001dfb80, 0xc00011e780, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001dfb80, 0xc00004c0a0, 0x8, 0x8, 0xc0001dfb80, 0xc00004c0a0)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001dfb80, 0xc00006c778, 0xc00010ff78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
E0828 09:45:47.915583 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1d49520, 0x2fd6c40)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:51 +0xcb
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc00014dc40, 0x22bc568, 0xc0003040a0, 0x22ea6a8, 0xc0005f5340, 0x2283ae0, 0xc00051f4b0, 0xc00014a9e0, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001dfb80, 0xc00011e780, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001dfb80, 0xc00004c0a0, 0x8, 0x8, 0xc0001dfb80, 0xc00004c0a0)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001dfb80, 0xc00006c778, 0xc00010ff78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
panic: runtime error: invalid memory address or nil pointer dereference [recovered[]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1b375ee]
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc00014dc40, 0x22bc568, 0xc0003040a0, 0x22ea6a8, 0xc0005f5340, 0x2283ae0, 0xc00051f4b0, 0xc00014a9e0, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001dfb80, 0xc00011e780, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001dfb80, 0xc00004c0a0, 0x8, 0x8, 0xc0001dfb80, 0xc00004c0a0)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001dfb80, 0xc00006c778, 0xc00010ff78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
CM:
apiVersion: v1
data:
cluster: main
config: "containerRuntimeExecutor: emissary\nartifactRepository:\n s3:\n accessKeySecret:\n
\ key: accesskey\n name: \n secretKeySecret:\n key: secretkey\n
\ name: \n bucket: \n endpoint: \n insecure: true\nsso:\n clientId:\n
\ key: client-id\n name: argo-workflows-sso\n clientSecret:\n key: client-secret\n
\ name: argo-workflows-sso\n issuer: https://argo-cd.dev.example.com/api/dex\n
\ redirectUrl: https://argo-wf.dev.example.com/oauth2/callback\n scopes:\n
\ - groups\n - email\n - openid\n sessionExpiry: 240h\n"
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"config":"containerRuntimeExecutor: emissary\nartifactRepository:\n s3:\n accessKeySecret:\n key: accesskey\n name: \n secretKeySecret:\n key: secretkey\n name: \n bucket: \n endpoint: \n insecure: true\nsso:\n clientId:\n key: client-id\n name: argo-workflows-sso\n clientSecret:\n key: client-secret\n name: argo-workflows-sso\n issuer: https://argo-cd.dev.example.com/api/dex\n redirectUrl: https://argo-wf.dev.example.com/oauth2/callback\n scopes:\n - groups\n - email\n - openid\n sessionExpiry: 240h\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"workflow-controller","app.kubernetes.io/instance":"argo-wf","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"argo-workflows-cm","app.kubernetes.io/part-of":"argo-workflows","argocd.argoproj.io/instance":"argo-wf","helm.sh/chart":"argo-workflows-0.5.0"},"name":"argo-workflow-controller-configmap","namespace":"argo"}}
creationTimestamp: "2021-08-20T14:37:02Z"
labels:
app.kubernetes.io/component: workflow-controller
app.kubernetes.io/instance: argo-wf
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: argo-workflows-cm
app.kubernetes.io/part-of: argo-workflows
argocd.argoproj.io/instance: argo-wf
helm.sh/chart: argo-workflows-0.5.0
name: argo-workflow-controller-configmap
namespace: argo
resourceVersion: "7460842"
uid: 20b10f52-007e-4e39-aa6f-e2472ec24883
@shuker85 your config requires the βawsβ binary to be installed on the workflow controller image. I think you can mount a volume with the binary on it and set the PATH env var to point to it.
@alexec any hints how to accomplish that?
Hi @alexec, I've tried to use an initContainer in order to get the aws binary:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "9"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"workflow-controller","app.kubernetes.io/instance":"argo-wf","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"argo-workflows-workflow-controller","app.kubernetes.io/part-of":"argo-workflows","app.kubernetes.io/version":"v0.0.0-dev-mc-0","argocd.argoproj.io/instance":"argo-wf","helm.sh/chart":"argo-workflows-0.5.0"},"name":"argo-workflow-controller","namespace":"argo"},"spec":{"replicas":2,"selector":{"matchLabels":{"app.kubernetes.io/instance":"argo-wf","app.kubernetes.io/name":"argo-workflows-workflow-controller"}},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"workflow-controller","app.kubernetes.io/instance":"argo-wf","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"argo-workflows-workflow-controller","app.kubernetes.io/part-of":"argo-workflows","app.kubernetes.io/version":"v0.0.0-dev-mc-0","helm.sh/chart":"argo-workflows-0.5.0"}},"spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app.kubernetes.io/name","operator":"In","values":["argo-workflows-workflow-controller"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"},"weight":100}]}},"containers":[{"args":["--configmap","argo-workflow-controller-configmap","--executor-image","quay.io/argoproj/argoexec:v0.0.0-dev-mc-0","--loglevel","info","--gloglevel","0"],"command":["workflow-controller"],"env":[{"name":"ARGO_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"LEADER_ELECTION_IDENTITY","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}}],"image":"quay.io/argoproj/workflow-controller:v0.0.0-dev-mc-0","imagePullPolicy":"Always","livenessProbe":{"failureThreshold":3,"httpGet":{"path":"/healthz","port":6060},"initialDelaySeconds":90,"periodSeconds":60,"timeoutSeconds":30},"name":"controller","ports":[{"containerPort":9090,"name":"metrics"},{"containerPort":6060}],"resources":{},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsNonRoot":true}}],"nodeSelector":{"kubernetes.io/os":"linux"},"serviceAccountName":"argo-workflow-controller"}}}}
creationTimestamp: "2021-08-20T14:37:04Z"
generation: 9
labels:
app.kubernetes.io/component: workflow-controller
app.kubernetes.io/instance: argo-wf
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: argo-workflows-workflow-controller
app.kubernetes.io/part-of: argo-workflows
app.kubernetes.io/version: v0.0.0-dev-mc-0
argocd.argoproj.io/instance: argo-wf
helm.sh/chart: argo-workflows-0.5.0
name: argo-workflow-controller
namespace: argo
resourceVersion: "8307045"
uid: e043a62a-b7d7-4c98-acd4-1103d58881fa
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: argo-wf
app.kubernetes.io/name: argo-workflows-workflow-controller
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: workflow-controller
app.kubernetes.io/instance: argo-wf
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: argo-workflows-workflow-controller
app.kubernetes.io/part-of: argo-workflows
app.kubernetes.io/version: v0.0.0-dev-mc-0
helm.sh/chart: argo-workflows-0.5.0
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- argo-workflows-workflow-controller
topologyKey: failure-domain.beta.kubernetes.io/zone
weight: 100
volumes:
- name: aws-bin
emptyDir: {}
initContainers:
- name: instal-aws-bin
image: registry.opensuse.org/opensuse/tumbleweed:latest
command: ["/bin/bash", "-c"]
args:
- |
set -x
echo "Installing AWS-CLI...";
zypper -n in curl unzip which
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
./aws/install
cp $(which -a aws) /custom-tools/aws
echo "Done.";
volumeMounts:
- mountPath: /custom-tools
name: aws-bin
containers:
- args:
- --configmap
- argo-workflow-controller-configmap
- --executor-image
- quay.io/argoproj/argoexec:v0.0.0-dev-mc-0
- --loglevel
- debug
- --gloglevel
- "9"
command:
- workflow-controller
env:
- name: ARGO_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: quay.io/argoproj/workflow-controller:v0.0.0-dev-mc-0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 6060
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 30
name: controller
volumeMounts:
- mountPath: /usr/bin/aws
name: aws-bin
subPath: aws
ports:
- containerPort: 9090
name: metrics
protocol: TCP
- containerPort: 6060
protocol: TCP
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: argo-workflow-controller
serviceAccountName: argo-workflow-controller
terminationGracePeriodSeconds: 30
InitContainer log:
Installing AWS-CLI...
+ echo 'Installing AWS-CLI...'
+ zypper -n -q in curl unzip which
The following 3 NEW packages are going to be installed:
curl unzip which
3 new packages to install.
Overall download size: 554.2 KiB. Already cached: 0 B. After the operation, additional 1.0 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
+ curl https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 42.2M 100 42.2M 0 0 85.9M 0 --:--:-- --:--:-- --:--:-- 86.0M
+ unzip -q awscliv2.zip
+ ./aws/install
You can now run: /usr/local/bin/aws --version
++ which -a aws
+ cp /usr/local/bin/aws /custom-tools/aws
+ echo Done.
Done.
WF-controller logs:
time="2021-08-29T19:33:58Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2021-08-29T19:33:58Z" level=info msg="cron config" cronSyncPeriod=10s
time="2021-08-29T19:33:58.047Z" level=info msg="not enabling pprof debug endpoints"
I0829 19:33:58.048197 1 merged_client_builder.go:121] Using in-cluster configuration
I0829 19:33:58.048476 1 merged_client_builder.go:163] Using in-cluster namespace
I0829 19:33:58.049316 1 round_trippers.go:425] curl -k -v -XGET -H "User-Agent: workflow-controller/v0.0.0 (linux/amd64) kubernetes/$Format/argo-workflows/v0.0.0-dev-mc-0 argo-controller" -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" 'https://172.16.16.1:443/api/v1/namespaces/argo/secrets/kubeconfig'
time="2021-08-29T19:33:58.063Z" level=info msg="Get secrets 200"
I0829 19:33:58.063510 1 round_trippers.go:445] GET https://172.16.16.1:443/api/v1/namespaces/argo/secrets/kubeconfig 200 OK in 14 milliseconds
I0829 19:33:58.063521 1 round_trippers.go:451] Response Headers:
I0829 19:33:58.063526 1 round_trippers.go:454] Cache-Control: no-cache, private
I0829 19:33:58.063530 1 round_trippers.go:454] Content-Type: application/json
I0829 19:33:58.063534 1 round_trippers.go:454] X-Kubernetes-Pf-Flowschema-Uid: 1664c2d8-01d8-48ff-9f10-21d83f7749e2
I0829 19:33:58.063538 1 round_trippers.go:454] X-Kubernetes-Pf-Prioritylevel-Uid: 5b41ea25-f82c-4db8-8ee6-84695e8c001f
I0829 19:33:58.063543 1 round_trippers.go:454] Content-Length: 3600
I0829 19:33:58.063547 1 round_trippers.go:454] Date: Sun, 29 Aug 2021 19:33:58 GMT
I0829 19:33:58.063552 1 round_trippers.go:454] Audit-Id: 731d63f6-44a8-41ac-a070-4b8f48c731ed
I0829 19:33:58.063628 1 request.go:1107] Response Body: {"kind":"Secret","apiVersion":"v1","metadata":{"name":"kubeconfig","namespace":"argo","uid":"ad32fa6f-b2cc-43a7-b253-da758e2d16ea","resourceVersion":"7450787","creationTimestamp":"2021-08-28T09:01:44Z","managedFields":[{"manager":"kubectl-create","operation":"Update","apiVersion":"v1","time":"2021-08-28T09:01:44Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:value":{}},"f:type":{}}}]},"data":{"value":"11111111mRxdjlZYzZpRURtWWxRSlAwSlRXa0w5Cg=="},"type":"Opaque"}
E0829 19:33:58.065511 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1d49520, 0x2fd6c40)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:48 +0x86
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc0001419c0, 0x22bc568, 0xc00055e000, 0x22ea6a8, 0xc0003946e0, 0x2283ae0, 0xc000023880, 0xc00005aa40, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001a6780, 0xc000192100, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001a6780, 0xc000142010, 0x8, 0x8, 0xc0001a6780, 0xc000142010)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001a6780, 0xc00006c778, 0xc00059df78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
E0829 19:33:58.065566 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1d49520, 0x2fd6c40)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:51 +0xcb
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc0001419c0, 0x22bc568, 0xc00055e000, 0x22ea6a8, 0xc0003946e0, 0x2283ae0, 0xc000023880, 0xc00005aa40, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001a6780, 0xc000192100, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001a6780, 0xc000142010, 0x8, 0x8, 0xc0001a6780, 0xc000142010)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001a6780, 0xc00006c778, 0xc00059df78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
panic: runtime error: invalid memory address or nil pointer dereference [recovered[]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1b375ee]
goroutine 1 [running[]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x2fd5c10, 0x1, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1d49520, 0x2fd6c40)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/argoproj/argo-workflows/v3/workflow/controller.NewWorkflowController(0x22bb8f0, 0xc0001419c0, 0x22bc568, 0xc00055e000, 0x22ea6a8, 0xc0003946e0, 0x2283ae0, 0xc000023880, 0xc00005aa40, 0x4, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:145 +0x1ce
main.NewRootCommand.func1(0xc0001a6780, 0xc000192100, 0x0, 0x8, 0x0, 0x0)
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:104 +0x63b
github.com/spf13/cobra.(*Command).execute(0xc0001a6780, 0xc000142010, 0x8, 0x8, 0xc0001a6780, 0xc000142010)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001a6780, 0xc00006c778, 0xc00059df78, 0x406365)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/go/src/github.com/argoproj/argo-workflows/cmd/workflow-controller/main.go:151 +0x2b
Which leads to https://github.com/argoproj/argo-workflows/blob/e7d0b7c8507fe635ca845a1f550eb866fb7d27b4/cmd/workflow-controller/main.go#L151
Ok. There was a bug. Can you try v0.0.0-dev-mc-1
?
Thanks @alexec ,
level=fatal msg="Failed to register watch for controller config map: if you have an item in your config map named 'config', you must only have one item"
More logs in https://gist.github.com/shuker85/eba5fb4452d9063adb42c39eb449f70b
Looks like youβre mixing old and new style configuration. Try manually editing the config map to fix it.
You're right, my workflow-controller-configmap has been populated by the community helm chart, where everything land under data.config.
I've tried to adapt latest changes from https://argoproj.github.io/argo-workflows/workflow-controller-configmap.yaml
Also since i've created the local
namespace i think it lacked proper perms from (4) on the readme
kubectl -n remote apply -f https://raw.githubusercontent.com/argoproj/argo-workflows/master/manifests/quick-start/base/workflow-role.yaml
kubectl -n remote create sa workflow
kubectl -n remote create rolebinding workflow --role=workflow-role --serviceaccount=remote:workflow
I've done s/remote/local
in this case.
latest error i got from trying to run the WF:
controller time="2021-08-30T14:41:09.901Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:41:09.914Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:41:10.283Z" level=info msg="List workflowtasksets 403"
controller E0830 14:41:10.283688 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/reflector.go:167: Failed to watch *v1alpha1.WorkflowTaskSet: failed to list *v1alpha1.WorkflowTaskSet: workflowtasksets.argoproj.io is forbidden: User "system:serviceaccount:argo:argo-workflow-controller" cannot list resource "workflowtasksets" in API group "argoproj.io" at the cluster scope
controller time="2021-08-30T14:41:14.923Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:41:14.946Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:41:16.564Z" level=info msg="List workflows 200"
controller time="2021-08-30T14:41:16.564Z" level=info msg=healthz age=5m0s err="<nil>" instanceID= labelSelector="!workflows.argoproj.io/phase,!workflows.argoproj.io/controller-instanceid" managedNamespace=
controller time="2021-08-30T14:41:19.955Z" level=info msg="Get leases 200"
...
controller time="2021-08-30T14:41:35.069Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:41:35.813Z" level=info msg="Processing workflow" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Get configmaps 404"
controller time="2021-08-30T14:41:35.823Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
controller time="2021-08-30T14:41:35.823Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
controller time="2021-08-30T14:41:35.823Z" level=info msg="Updated phase -> Running" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Pod node multi-cluster-jf2qq initialized Pending" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=error msg="Recovered from panic" namespace=local r="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 242 [running[]:\nruntime/debug.Stack(0xc037b16329, 0x1d49520, 0x2fd6c40)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate.func2(0xc0000ec0c0, 0x22bb9e8, 0xc000128000)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:192 +0xd1\npanic(0x1d49520, 0x2fd6c40)\n\t/usr/local/go/src/runtime/panic.go:971 +0x499\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).createWorkflowPod(0xc0000ec0c0, 0x22bb9e8, 0xc000128000, 0xc000c684e0, 0x13, 0xc0002df008, 0x1, 0x1, 0xc000d50280, 0xc0002defc8, ...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/workflowpod.go:152 +0x1f9\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeContainer(0xc0000ec0c0, 0x22bb9e8, 0xc000128000, 0xc000c684e0, 0x13, 0xc000869120, 0x19, 0xc000d50280, 0x22a6670, 0xc0000ec480, ...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2373 +0x310\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeTemplate(0xc0000ec0c0, 0x22bb9e8, 0xc000128000, 0xc000c684e0, 0x13, 0x22a6670, 0xc0000ec480, 0xc000c206c0, 0x0, 0x0, ...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1813 +0x268e\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate(0xc0000ec0c0, 0x22bb9e8, 0xc000128000)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:342 +0xf1e\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem(0xc000100800, 0x22bb9e8, 0xc000128000, 0x0)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:841 +0x830\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker(0xc000100800)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:763 +0x9b\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0001f36a0)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155 +0x5f\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0001f36a0, 0x22754c0, 0xc000baf9e0, 0x1, 0xc000115980)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156 +0x9b\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0001f36a0, 0x3b9aca00, 0x0, 0x1, 0xc000115980)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133 +0x98\nk8s.io/apimachinery/pkg/util/wait.Until(0xc0001f36a0, 0x3b9aca00, 0xc000115980)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:90 +0x4d\ncreated by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).startLeading\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:389 +0x527\n" workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Updated phase Running -> Error" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Updated message -> runtime error: invalid memory address or nil pointer dereference" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Marking workflow completed" namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.823Z" level=info msg="Checking daemoned children of " namespace=local workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.842Z" level=info msg="Update workflows 200"
controller time="2021-08-30T14:41:35.843Z" level=info msg="Workflow update successful" namespace=local phase=Error resourceVersion=8853138 workflow=multi-cluster-jf2qq
controller time="2021-08-30T14:41:35.846Z" level=info msg="Create events 201"
controller time="2021-08-30T14:41:35.861Z" level=info msg="Create events 201"
controller time="2021-08-30T14:41:40.075Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:41:40.084Z" level=info msg="Update leases 200"
...
controller time="2021-08-30T14:41:55.157Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:41:59.790Z" level=info msg="List workflowtasksets 403"
controller E0830 14:41:59.790436 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/reflector.go:167: Failed to watch *v1alpha1.WorkflowTaskSet: failed to list *v1alpha1.WorkflowTaskSet: workflowtasksets.argoproj.io is forbidden: User "system:serviceaccount:argo:argo-workflow-controller" cannot list resource "workflowtasksets" in API group "argoproj.io" at the cluster scope
controller time="2021-08-30T14:42:00.163Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:42:00.182Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:42:05.198Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:42:05.212Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:42:09.994Z" level=info msg="Watch workflowtemplates 200"
controller time="2021-08-30T14:42:10.217Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:42:10.234Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:44:52.936Z" level=info msg="Alloc=8142 TotalAlloc=63486 Sys=73809 NumGC=18 Goroutines=172"
controller time="2021-08-30T14:45:16.559Z" level=info msg=healthz age=5m0s err="<nil>" instanceID= labelSelector="!workflows.argoproj.io/phase,!workflows.argoproj.io/controller-instanceid" managedNamespace=
controller time="2021-08-30T14:45:21.592Z" level=info msg="Get leases 200"
controller time="2021-08-30T14:45:21.614Z" level=info msg="Update leases 200"
controller time="2021-08-30T14:45:26.164Z" level=info msg="List workflowtasksets 403"
controller E0830 14:45:26.164677 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/reflector.go:167: Failed to watch *v1alpha1.WorkflowTaskSet: failed to list *v1alpha1.WorkflowTaskSet: workflowtasksets.argoproj.io is forbidden: User "system:serviceaccount:argo:argo-workflow-controller" cannot list resource "workflowtasksets" in API group "argoproj.io" at the cluster scope
This panic occurred because there is not context for your cluster. Check the kubeconfig
contains a context with the same name as the cluster in your template. I'll add some extra diagnostics.
Is there any update on this feature?
@boshnak - @JPZ13 and I have been working on a feature based on @alexec 's POC where users can run a workflow that has 2+ steps running in different clusters or namespaces. We're wrapping up the design phase for that now. This is different from a multi-cluster control plane, which is our day job at Pipekit.
Would you mind sharing your use case(s) and any requirements you have for multi-cluster workflows? If you're open to speaking live, I'm at c@pipekit.io and we can find a time to run through your requirements.
@caelan-io Thanks for the prompt response. Our main use case is that we have 4 kubernetes clusters, and we would like to have single centrlized Argo workflow instance, from which we will be able to trigger workflows on all the clusters. So its a combination of having central argo server and the ability to trigger workflows on other clusters. I guess having specific steps triggered on different clusters does answer the majority of the use case.
Hi @alexec thanks for working on this, been trying to make it work, I am not sure why the workflow pod is not spinning on the remote cluster, logs just indicate it is choosing the main(where argo is installed) cluster.
Logs from the workflow-controller seems to indicate the cluster is added and that it is reading the kubeconfig secret:
time="2022-03-07T23:09:47.244Z" level=info msg="starting pod informer" cluster=main labelSelector="workflows.argoproj.io/completed=false,!multi-cluster.argoproj.io/owner-cluster,!workflows.argoproj.io/controller-instanceid" managedNamespace=argo time="2022-03-07T23:09:47.244Z" level=info msg="starting pod informer" cluster=dgx-us labelSelector="workflows.argoproj.io/completed=false,multi-cluster.argoproj.io/owner-cluster=main,!workflows.argoproj.io/controller-instanceid" managedNamespace=argo
Notice how the pod indicates is going on cluster: main, instead of cluster: dgx-us as I am indicating on the wf:
time="2022-03-07T23:12:27.408Z" level=info msg="Processing workflow" namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.411Z" level=info msg="Get configmaps 200" time="2022-03-07T23:12:27.411Z" level=info msg="resolved artifact repository" artifactRepositoryRef="argo/#" time="2022-03-07T23:12:27.411Z" level=info msg="Updated phase -> Running" namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.411Z" level=info msg="Pod node multi-cluster-finaltest initialized Pending" namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.411Z" level=info msg="creating workflow pod" cluster=main exists=false namespace=argo nodeID=multi-cluster-finaltest ownershipCluster=main podName=multi-cluster-finaltest time="2022-03-07T23:12:27.439Z" level=info msg="Create events 201" time="2022-03-07T23:12:27.452Z" level=info msg="Create pods 201" time="2022-03-07T23:12:27.457Z" level=info msg="Created pod: multi-cluster-finaltest (multi-cluster-finaltest)" namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.457Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.457Z" level=info msg=reconcileAgentPod namespace=argo workflow=multi-cluster-finaltest time="2022-03-07T23:12:27.473Z" level=info msg="Update workflows 200" time="2022-03-07T23:12:27.474Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=253591298 workflow=multi-cluster-finaltest
NOTE: I am trying to run the workflows local on argo namespace as well as on the remote cluster.
Will much appreciate your guidance, I am using tag: v0.0.0-dev-mc-4
Hi @alexec , this feature looks really promising, thanks for working on it! Just curious though, roughly how long do you think it'll take to release this officially? My team is looking to make use of multi-cluster workflows in the future, so we'd appreciate any estimates you can provide.
I really like this feature. What is the stage of this feature now?
This is now available to try out:
Install from here:
https://github.com/argoproj/argo-workflows/releases/tag/v0.0.0-dev-mc-6
Read this to learn how to configure:
https://github.com/argoproj/argo-workflows/blob/dev-mc/docs/multi-cluster.md
Can I please ask everyone to complete this survey:
Hi @alexec, I would like to confirm whether the newest version(v0.0.0-dev-mc-6) supports different steps of the same workflow running on different clusters.
I test on my local site, when install profile with two cluster member1
and member2
, will produce error message like profile not found for policy argo,member2,default,1
.
Hi @XiShanYongYe-Chang I need to update this, as Iβve changed the design. Hopefully today.
v0.0.0-dev-mc-7 is now ready for testing.
@alexec thanks for your reply, I test with v0.0.0-dev-mc-7
, it's pretty good to me. With this release, I can run workflow-a-
on member1
cluster and workfolw-b-
on member2
cluster separately.
Will this feature be supported in the one workflow, such as workflow-test-
, where step-a
runs in member1
cluster and step-b
runs in member2
cluster?
Yes. Thatβs the primary intent.
@alexec Is there a way to list the cluster profiles for debugging? I did the argo cluster get-profile cluster-1 ...
command, and it was successful, but on running a workflow, I get error in entry template execution: profile not found for "cluster-1"
.
kubectl get secret -l workflows.argoproj.io/cluster
will list the secrets profiles.
I think I'm running into a namespace issue. kubectl get secret -l workflows.argoproj.io/cluster
returns No resources found in default namespace.
and kubectl get secret -l workflows.argoproj.io/cluster -n argo
returns argo.profile.cluster-1 Opaque 1 86m
. @alexec what namespace do you have the server and controller installed in, and which namespace is the profile in for you?
profile goes in the argo system namespace
Updated version for testing:
https://github.com/argoproj/argo-workflows/releases/tag/v0.0.0-dev-mc-8
Summary
Run workflows across multiple clusters.
Motivation
So you only need to run one Argo Workflows installation. So you can run a workflow that has nodes in different clusters.
Proposal
Like Argo CD.
3516
Message from the maintainers:
If you wish to see this enhancement implemented please add a π reaction to this issue! We often sort issues this way to know what to prioritize.