aws-samples / eks-kubeflow-workshop

Kubeflow workshop on EKS. Mainly focus on AWS integration examples. Please go check kubeflow website http://kubeflow.org for other examples
Apache License 2.0
96 stars 56 forks source link

Improve mnist pipeline example #33

Closed Jeffwan closed 4 years ago

Jeffwan commented 4 years ago

https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/05_Kubeflow_Pipeline/05_03_Pipeline_mnist.ipynb

  1. Consider to use random names for mnist-ui, mnist-model. Otherwise user has to delete resources to rerun the pipeline. This brings some challenges to get exact name for inference. There's a tradeoff and we should keep simplification in mind and make changes to improve experiences.

Reference: commands to delete resources

k delete deployment mnist-model  mnist-ui
k delete svc mnist-service  mnist-ui
k delete vs mnist-ui

Note: k is alias of kubectl -n kubeflow

  1. Currently, user doesn't have permission to check kubeflow namespace. Check why this doesn't work?
    kubectl create --namespace=anonymous rolebinding --clusterrole=kubeflow-view --serviceaccount=anonymous:default-editor anonymous-kubeflow-view 

Otherwise, we just give user cluster-admin permission which is not recommended but works.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: anonymous-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default-editor
  namespace: anonymous
  1. Provides Dockerfile and update python to write event logs, write an artifact to enable tensorboard

  2. Use high level KFP Op to construct TFjob instead of ResourceOp

  3. Consider to pass S3 MNIST training data to pipeline instead of download dataset in the code. Add instructions to replace S3 export bucket to user's

Jeffwan commented 4 years ago

/cc @PatrickXYS @cfregly

PatrickXYS commented 4 years ago

Hi @Jeffwan, I was wondering do we really need have permission to check kubeflow namespace? Since this is required because the namespace we set in the notebook is kubeflow, what if we set the namespace as the namespace we created in kubeflow dashboard, then we don't need rolebinding.

For example, if I use below logic to find the namespace I'm using right now not simply kubeflow:

with open('/var/run/secrets/kubernetes.io/serviceaccount/namespace', 'r') as f:
    namespace = f.readline()

Then the SA user should have permission to check the namespace. We then set the namespace we got above in the pipeline:

def mnist_pipeline(
        name="mnist-{{workflow.uid}}",
        namespace=namespace,
        step="1000",
        s3bucketexportpath=""):

And it should be good to go, I've already tested it.

Jeffwan commented 4 years ago

@PatrickXYS

Once multi-user pipeline is supported, this will be easier.

Currently, there're two different groups of containers

  1. Pipeline runners - by default they are running inside Kubeflow
  2. ResourceOps -> katib, tfjob, mnist deployment/svc. If pipeline-runner SA has permission, they can create resources in separate namespaces

I think there're few principles

  1. I think it's ok to have pipeline runners and applications in separate namespace.
  2. It would be better to have all resources running in user's namespace which means katib job, tfjob and all rest resources in this pipeline should be in user's own namespace.
with open('/var/run/secrets/kubernetes.io/serviceaccount/namespace', 'r') as f:
    namespace = f.readline()

The namespace here is still the pod namespace, the runner's pod is in kubeflow namespace, then it's kubeflow here. How do you use other namespace here?

PatrickXYS commented 4 years ago

@Jeffwan The latest PR is able to address No.1, 3, 4. Will put more efforts to solve all the issues.

Jeffwan commented 4 years ago

Let's close this issue now. If we have extra feedbacks, we can revisit the issue