GoogleCloudPlatform / oss-test-infra

https://oss-prow.knative.dev
Apache License 2.0
49 stars 134 forks source link

Enable kubeflow pipeline CI/CD for ppc64le #1972

Open mdafsanhossain opened 1 year ago

mdafsanhossain commented 1 year ago

We are trying to enable kubeflow pipelines prowjobs to run tests on ppc64le k8s clusters. To get started,

mdafsanhossain commented 1 year ago

I am thinking of a similar approach to that of knative for adding secrets related to ppc64le cluster

https://github.com/knative/infra/blob/a51593dc3677530150434ba6eb0c588bd520aefa/prow/cluster/build/secrets.yaml#L50

Any thoughts on this?

lehrig commented 1 year ago

@chensun @zijianjoy - can you help answering above questions?

(it's relevant for https://github.com/kubeflow/pipelines/issues/8660#issuecomment-1594657540)

ghatwala commented 1 year ago

For tekton CI upstream , power based k8s clusters are used for running nightly CI jobs . initial proposal - https://github.com/tektoncd/community/blob/main/teps/0051-ppc64le-architecture-support.md Nightly CI on power k8s cluster - https://dashboard.dogfooding.tekton.dev/#/namespaces/bastion-p/pipelineruns

seth-priya commented 1 year ago

@chensun @zijianjoy any thoughts on this?

lehrig commented 1 year ago

Any news on this @chensun @zijianjoy ?

chensun commented 1 year ago

How can we add our k8s clusters which are in IBM Cloud? Will it be via K8s ExternalSecrets? https://github.com/GoogleCloudPlatform/oss-test-infra/blob/master/prow/oss/cluster/kubernetes_external_secrets.yaml

What changes are required in https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs/kubeflow/pipelines to enable prowjobs for ppc64le?

I have no experience to the first question. You can give it a try following what you found from knative or tekton. We can test it in the field as long as the test--if not working--doesn't block existing workflow.

For the second question, you can follow this example to add a new test: https://github.com/GoogleCloudPlatform/oss-test-infra/commit/b5340c00f25112d9af4d0bb0d926173c254562b0 (make sure to include optional: true). You can probably fork this one and customize it to testing on ppc64le. You may need to expand the argument list of https://github.com/kubeflow/pipelines/blob/master/test/presubmit-tests-with-pipeline-deployment.sh or even fork it.

mdafsanhossain commented 1 year ago

@chensun I am assuming we will need to customize this workflow file https://github.com/kubeflow/pipelines/blob/master/test/e2e_test_gke_v2.yaml for ppc64le? Or should this workflow work without modifications?

lehrig commented 1 year ago

@chensun, can you comment on this? We'd like to get this sorted out...Thanks!

chensun commented 1 year ago

@chensun I am assuming we will need to customize this workflow file https://github.com/kubeflow/pipelines/blob/master/test/e2e_test_gke_v2.yaml for ppc64le? Or should this workflow work without modifications?

I don't know enough the difference between ppec64le and x86/amd64, it's probably up to your exploration to tell wether customization is needed or not.

valen-mascarenhas14 commented 11 months ago

@chensun @lehrig We successfully ran the prow job for kubeflow-pipelines-component-yaml which runs on a ppc64le cluster on our local prow setup . The tests have passed.

I've attached the link of the prow job here for your reference kubeflow-pipelines-component-yaml-ppc64le.

lehrig commented 10 months ago

@chensun, does that look good to you, so we can prepare a PR?

chensun commented 10 months ago

kubeflow-pipelines-component-yaml is only a Python SDK test, which I assume, does not have much dependency on the underlying system architecture. You might encounter other challenges when adding e2e tests.

That being said, seems like you've figured out how to run prow job on a ppc64le cluster. Feel free to send PRs to enable this route.

valen-mascarenhas14 commented 10 months ago

Hey @chensun, We are in the process of enabling the kubeflow-pipeline-e2e-test job. As we are only running the tests on the ppcc64le cluster, it raises a question regarding the deployment platform configuration.

Given that the presubmit-tests-with-pipeline-deployment.sh script utilizes GCP as the deployment platform, we are uncertain whether we should leverage the existing GCP configuration or create a new one tailored for our specific use case.

Your guidance on this matter would be of great help. Thanks

mkumatag commented 5 months ago

I also see an option for the minikube as well, wondering if we can explore that path or something else to trigger the tests. Also see that they are using argocd, which may become one of the prerequisites for running these tests.

Given that the presubmit-tests-with-pipeline-deployment.sh script utilizes GCP as the deployment platform, we are uncertain whether we should leverage the existing GCP configuration or create a new one tailored for our specific use case.