deployKF / deployKF

deployKF builds machine learning platforms on Kubernetes. We combine the best of Kubeflow, Airflow†, and MLflow† into a complete platform.
https://www.deploykf.org/
Apache License 2.0
318 stars 35 forks source link

allow deploying alongside existing argo-workflows controller #116

Open thesuperzapper opened 3 months ago

thesuperzapper commented 3 months ago

Motivation

Currently, it's not possible to run deployKF alongside an existing Argo Workflows controller, if the Kubeflow Pipelines tool is enabled.

Specifically, users are not allowed to set kubeflow_dependencies.kubeflow_argo_workflows.enabled to false, if kubeflow_tools.pipelines.enabled is true.

Implementation

There are good reasons why we don't let users bring their own Argo Workflows when using Kubeflow Pipelines, namely that it would be nearly impossible to get working as Kubeflow Pipelines depends on a specific Argo Workflows version, and has lots of KFP-specific requirements (like credentials for the S3 buckets).

We also pre-configure an Argo Server (Web UI) instance that is connected to deployKF's auth system, so users can have the same access as they have in the Kubeflow Pipelines UI (based on their profile membership).

Workarounds

Currently, there are three workarounds for existing Argo Workflows users:

  1. Don't use Kubeflow Pipelines with deployKF:
    • set kubeflow_dependencies.kubeflow_argo_workflows.enabled and kubeflow_tools.pipelines.enabled to false.
  2. Uninstall your existing Argo Workflows controller, and migrate to the one managed by deployKF:
    • NOTE: We need feedback about what else you might want to configure about our embedded Argo Workflows, to allow this migration.
  3. Configure your existing Argo Workflows controller instance id to something non-default, and update all your non-KFP workflows to select this instance:
    • NOTE: We need feedback on this approach, so we can confirm if it works properly.
andef commented 3 months ago

I can confirm that using an instance id on the existing workflow controller solves the problem. Also checked the source code for the controller which supports the findings.

There is still an issue with the metadata-writer that can only be configured to listen to one namespace or all namespaces, with the label selector workflows.argoproj.io/workflow. This needs to be either support multiple namespaces or maybe checking that the instance id label does not exist. Or, as you mentioned in slack @thesuperzapper, kfp should use an instance id for their controller. This does not create any problems with the workflow though, but it is just unnecessary writes to the metadata db, and some extra labels put on the pod.