⚠️ Argoflow-AWS has been superseded by deployKF ⚠️
deployKF makes it easy to build reliable ML Platforms on Kubernetes and supports more than just Kubeflow!
deployKF supports all Kubernetes distributions and has native integrations with AWS.
This project offers a Kubeflow distribution that has the following characteristics:
This distribution assumes that you will be making use of the following AWS services:
external-dns
Pod a ServiceAccount that uses an IAM Role allowing certain actions in Route53. See the section below for a detailed listing of IRSA policies that are needed.In the future we may develop overlays that would make some of these services optional, but for the current release if you wish to take them out this needs to be done after forking the repo.
Below you will find all of the IAM Policies that need to be attached to the IRSA roles. Before looking at the policies though, please take note of the fact that IRSA works via setting up a Trust relationship to a specific ServiceAccount in a specific Namespace. If you find that an IAM role is not being correctly assumed, it probably means that you are attaching it to a ServiceAccount that hasn't explicitly been authorized to do so.
Let's take the external-dns service as an example. The ServiceAccount for this application is defined here, is named external-dns
and is rolled out in the kube-system
Namespace. To allow this ServiceAccount to assume an IAM Role, we have to set a trust relationship that looks as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/SOMEUNIQUEID1234567890"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-central-1.amazonaws.com/id/SOMEUNIQUEID1234567890:sub": "system:serviceaccount:kube-system:external-dns"
}
}
}
]
}
For every IRSA Role you set up, you will need a trust relationship such as the one above (substituting of course for the actual oidc provider url) and setting values "kube-system" and "external-dns" in system:serviceaccount:kube-system:external-dns
for appropriate for the Namespace and ServiceAccount names respectively.
Further down in this guide we explain how to initialise this repository. For now, just take note that we use placeholder values such as <<__role_arn.external_dns__>>
that will be replaced by the actual ARNs of the roles you wish to use. Below is a listing of all of the IRSA roles in use in this repository, along with links to JSON files with example policies. If you do a search on the whole "distribution" folder you find exactly where these placeholders are used.
aws-load-balancer-controller
Needs policies that allows it to provision a NLB in specific subnets.
<<__role_arn.aws_load_balancer_controller__>>
arn:aws:iam::123456789012:role/my-cluster_kube-system_aws-load-balancer-controller
cluster-autoscaler
Needs policies that allows it to automatically scale EC2 instances up/down.
<<__role_arn.cluster_autoscaler__>>
arn:aws:iam::123456789012:role/my-cluster_kube-system_aws-cluster-autoscaler
external-dns
Needs policies that allows it to automatically create record sets in Route53.
<<__role_arn.external_dns__>>
arn:aws:iam::123456789012:role/my-cluster_kube-system_external-dns
certificate-manager
Needs policies that allows it to automatically create entries in Route53 in order to allow for DNS-01 solving.
<<__role_arn.cert_manager__>>
arn:aws:iam::123456789012:role/my-cluster_cert-manager_cert-manager
external-secrets
The external-secrets application is a middleman that will create ExternalSecret custom resources in specific namespaces. It can be configured in two ways.
Option 1: Allow the external-secret application broad authority to read and write AWS secrets
Option 2: Allow the external-secret application to assume roles that have more narrowly defined
<<__role_arn.external_secrets>>
arn:aws:iam::123456789012:role/my-cluster_kube-system_external_secrets
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/my-cluster_kube-system_external-secrets"
},
"Action": "sts:AssumeRole"
}
]
}
In addition, we need to grant each role limited access to secrets. We have chosen an approach of limiting access to secrets by namespace, but it is possible to make this more granular if desired.
ExternalSecret
for the argocd
namespace<<__role_arn.external_secrets.argocd__>>=
arn:aws:iam::123456789012:role/my-cluster_argocd
ExternalSecret
for the kubeflow
namespace<<__role_arn.external_secrets.kubeflow__>>=
arn:aws:iam::123456789012:role/my-cluster_kubeflow
ExternalSecret
for the mlflow
namespace<<__role_arn.external_secrets.mlflow__>>=
arn:aws:iam::123456789012:role/my-cluster_mlflow
ExternalSecret
for the auth
namespace<<__role_arn.external_secrets.auth__>>=
arn:aws:iam::123456789012:role/my-cluster_auth
ExternalSecret
for the istio-system
namespace<<__role_arn.external_secrets.auth__>>=
arn:aws:iam::123456789012:role/my-cluster_istio-system
ExternalSecret
for the monitoring
namespace<<__role_arn.external_secrets.monitoring__>>=
arn:aws:iam::123456789012:role/my-cluster_monitoring
There are two supported AWS backend types:
<<__external_secrets.backend_type__>>=secretsManager
systemManager
.
<<__external_secrets.backend_type__>>=systemManager
Unfortunately at the moment it is not possible to use IRSA in conjunction with Kubeflow Pipelines, which currently uses both the Minio Go and JavaScript clients. On both of those, additional work is needed to enable IRSA. Please see this tracking issue: https://github.com/kubeflow/pipelines/issues/3405
For now, we use an IAM User in order to facilitate writing Pipeline artifacts to S3. The user's credentials are fetched from the AWS Secret Manager using and ExternalSecret. The relevant details for the IAM User are as follows
<<__external_secret_name.kubeflow.s3_accesskey__>>
<<__external_secret_name.kubeflow.s3_secretkey__>>
This repository contains Kustomize manifests that point to the upstream manifest of each Kubeflow component and provides an easy way for people to change their deployment according to their need. ArgoCD application manifests for each componenet will be used to deploy Kubeflow. The intended usage is for people to fork this repository, make their desired kustomizations, run a script to change the ArgoCD application specs to point to their fork of this repository, and finally apply a master ArgoCD application that will deploy all other applications.
Mandatory:
Optional (if using setup_credentials.sh to generate initial credentials as sealed secrets):
setup.conf
file and setup_repo.sh
scriptThis repository uses a very simple initialisation script, ./setup_repo.sh that takes a config file such as the example one, ./examples/setup.conf and iterates over all lines therein. A single line would for example look as follows:
<<__role_arn.cluster_autoscaler__>>=arn:aws:iam::123456789012:role/my-cluster_kube-system_aws-cluster-autoscaler
The init script will look for all occurences in the ./distribution folder of the placeholder <<__role_arn.cluster_autoscaler__>>
and will replace it with the value arn:aws:iam::123456789012:role/my-cluster_kube-system_aws-cluster-autoscaler
. Please note that comments (//
, #
), quotation marks ("
, '
) or unnecessary line-breaks should be avoided.
You may add any additional placeholder/value pairs you want. The naming convention <<__...__>>
has no functional purpose other than to aid readability and minimise the risk of a "find-and-replace" being performed on a value that was not meant as a placeholder.
Finally, if you wish you can use the "setup_credentials.sh" script to generate SealedSecrets that will be used for access to "admin" applications, such as the ArgoCD dashboard (in the future), Dex, Keycloak, the kubeflow admin user etc. This script will generate various random credentials and create a "sealed" representation that is safe to declare in your Git repository.
Run the following commands to install the kubeseal CLI on Linux:
wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.16.0/kubeseal-linux-amd64 -O kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubeseal
On mac you can use Brew to install the kubeseal CLI:
brew install kubeseal
Next, ensure passlib is installed:
pip install passlib
Deploy the Sealed Secrets controller to the cluster:
kubectl apply -f distribution/argocd-applications/sealed-secrets.yaml
Finally, the script can be run with:
./setup_credentials.sh --email test@test.com --username youruser --firstname Yourname --lastname Yoursurname --password yourpassword
You may leave out any of the input paramaters. In that case, a default value (or generated value in the case of passwords) will be used. Alternatively, environmnet variables can be used instead of input parameters.
To initialise your repository, do the following:
distribution/kubeflow.yaml
with the selection of applications you wish to roll out./setup_repo.sh setup.conf
./setup_credentials.sh --email test@test.com --username youruser --firstname Yourname --lastname Yoursurname --password yourpassword
Start up external-secret:
kustomize build distribution/external-secrets/ | kubectl apply -f -
Start up argocd:
If you are using a public repo:
kustomize build distribution/argocd/base/ | kubectl apply -f -
If you are using a private repo (note that this will use an ExternalSecret to fetch git credentials from the AWS Secret Manager):
kustomize build distribution/argocd/overlays/private-repo/ | kubectl apply -f -
Finally, roll out Kubeflow with:
kubectl apply -f distribution/kubeflow.yaml
If you wish, you may also set up ArgoCD to manage itself, as follows:
If you are using a public repo:
kubectl apply -f distribution/argocd-applications/argocd.yaml
If you are using a private repo:
kubectl apply -f distribution/argocd-applications/argocd-private-repo.yaml
To customize the list of images presented in the Jupyter Web App and other related setting such as allowing custom images, edit the spawner_ui_config.yaml file.
A large problem for many people is how to easily upload or download data to and from the PVCs mounted as their workspace volumes for Notebook Servers. To make this easier a simple PVCViewer Controller was created (a slightly modified version of the tensorboard-controller). This feature was not ready in time for 1.3, and thus I am only documenting it here as an experimental feature as I believe many people would like to have this functionality. The images are grabbed from my personal dockerhub profile, but I can provide instructions for people that would like to build the images themselves. Also, it is important to note that the PVC Viewer will work with ReadWriteOnce PVCs, even when they are mounted to an active Notebook Server.
Here is an example of the PVC Viewer in action:
To use the PVCViewer Controller, it must be deployed along with an updated version of the Volumes Web App. To do so, deploy experimental-pvcviewer-controller.yaml and experimental-volumes-web-app.yaml instead of the regular Volumes Web App. If you are deploying Kubeflow with the kubeflow.yaml file, you can edit the root kustomization.yaml and comment out the regular Volumes Web App and uncomment the PVCViewer Controller and Experimental Volumes Web App.
By default, all the ArgoCD application specs included here are setup to automatically sync with the specified repoURL. If you would like to change something about your deployment, simply make the change, commit it and push it to your fork of this repo. ArgoCD will automatically detect the changes and update the necessary resources in your cluster.
By default the ArgoCD UI is rolled out behind a ClusterIP. This can be accessed for development purposes with port forwarding, for example:
kubectl port-forward svc/argocd-server -n argocd 8888:80
The UI will now be accessible at localhost:8888
and can be accessed with the initial admin password. The password is stored in a secret and can be read as follows:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
If you wish to update the password, this can be done using the argcd cli, using the following commands:
argocd login localhost:8888
argocd account update-password
Before contributing, please install pre-commit and initialise .pre-commit-config.yaml
by running the following from the repo's root directory:
pre-commit install
Please feel free to add features by forking this repo, developing and testing your feature and merging back to master via a Pull Request. We are currently still a small community, but feel free to also report bugs or make issue requests on the issue board!