aws-samples / eks-kubeflow-workshop

Kubeflow workshop on EKS. Mainly focus on AWS integration examples. Please go check kubeflow website http://kubeflow.org for other examples
Apache License 2.0
96 stars 56 forks source link

05_03Pipeline_mnist.ipynb getting timeout in kubeflow-launch-tfjob #86

Open radhakrishnang opened 3 years ago

radhakrishnang commented 3 years ago

Notebook 05_03_Pipeline_mnist.ipynb

Describe the bug Failing in step 3 : kubeflow-launch-tfjob - Timeout waiting

INFO:root:Current condition of kubeflow.org/tfjobs mnist-5500928c-3475-4dd7-9463-570f5c0a494a in namespace anonymous is Created. Traceback (most recent call last): File "/ml/launch_tfjob.py", line 136, in main() File "/ml/launch_tfjob.py", line 131, in main timeout=datetime.timedelta(minutes=args.tfjobTimeoutMinutes)) File "/ml/launch_crd.py", line 75, in wait_for_condition "conditions {4}.".format(self.group, self.plural, name, namespace, expected_conditions)) Exception: Timeout waiting for kubeflow.org/tfjobs mnist-5500928c-3475-4dd7-9463-570f5c0a494a in namespace anonymous to enter one of the conditions ['Succeeded', 'Failed'].

Expected behavior Kubeflow pipeline execute successfully

Screenshots If applicable, add screenshots to help explain your problem. image

Environment (please complete the following information):