aws-samples / eks-kubeflow-workshop

Kubeflow workshop on EKS. Mainly focus on AWS integration examples. Please go check kubeflow website http://kubeflow.org for other examples
Apache License 2.0
96 stars 56 forks source link

Pipeline mnist gives an error during pipeline run #44

Closed dalbhanj closed 4 years ago

dalbhanj commented 4 years ago

Notebook affected: https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/05_Kubeflow_Pipeline/05_03_Pipeline_mnist.ipynb

After you submit pipeline, step 3 kubeflow-launch-tfjob fails with this error Traceback (most recent call last): File "/ml/launch_tfjob.py", line 22, in import launch_crd ImportError: No module named launch_crd

PatrickXYS commented 4 years ago

@dalbhanj thank you for proposing the issue, I'll take it over and fix it up.

Jeffwan commented 4 years ago

@PatrickXYS I don't see this issue, module launch_crd should come from container image, What's the image we are using now? Is it a regression issue after your refactor?

PatrickXYS commented 4 years ago

Yes, it should come from container image, which should be the image hosted by me. There's no launch_crd now. It should be a regression issue after my factor, I'm working on it.

Jeffwan commented 4 years ago

@PatrickXYS As we don't have CI now, try to make sure things work before check in.

PatrickXYS commented 4 years ago

It should be fixed by the merged PR, @dalbhanj let us know if it's good to go.

dalbhanj commented 4 years ago

This isn't resolved yet bcoz it only works in 'us-west-2' region, yeah?

PatrickXYS commented 4 years ago

After investigation, region settings are not the root cause. Let's keep the investigation and updated.

Jeffwan commented 4 years ago

@PatrickXYS

Can you extract region as an argument? Why do we stick to us-west-2 now?