aws-samples / eks-workshop

AWS Workshop for Learning EKS
https://eksworkshop.com
MIT No Attribution
804 stars 1.24k forks source link

400 at mnist.py line 89 #553

Closed danheo412 closed 4 years ago

danheo412 commented 4 years ago

Hi, I was following the workshop on ML with EKS and Kubeflow, but I ran into a blocker during the steps on training a model. I followed the steps exactly, but the pods running the training job keeps failing and I got this below. I tried it three times and failis the same way. I’m wondering if there is something wrong with the image…? Or access control…? Not sure at all. Could you pls help me debug..? Instruction is from: https://eksworkshop.com/kubeflow/training/

LOG OUTPUT:
workshop:~/environment/eksworkshop-eksctl $ kubectl logs mnist-training -f
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
40960/29515 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
26435584/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
16384/5148 [===============================================================================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
4431872/4422102 [==============================] - 0s 0us/step
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-25 06:28:10.764270: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-11-25 06:28:10.787679: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2019-11-25 06:28:10.787868: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4d6fec0 executing computations on platform Host. Devices:
2019-11-25 06:28:10.787890: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "mnist.py", line 89, in <module>

train_images.shape: (60000, 28, 28, 1), of float64
test_images.shape: (10000, 28, 28, 1), of float64
_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
Conv1 (Conv2D)               (None, 13, 13, 8)         80       
_________________________________________________________________
flatten (Flatten)            (None, 1352)              0        
_________________________________________________________________
Softmax (Dense)              (None, 10)                13530    
    main()
  File "mnist.py", line 82, in main
=================================================================
Total params: 13,610
Trainable params: 13,610
Non-trainable params: 0
_________________________________________________________________
  model = train(train_images, train_labels, args.epochs, args.model_summary_path)
  File "mnist.py", line 51, in train
    model.fit(train_images, train_labels, epochs=epochs, callbacks=[tensorboard_callback])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 215, in model_iteration
    mode=mode)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 106, in configure_callbacks
    callback_list.set_model(callback_model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 178, in set_model
    callback.set_model(model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 1010, in set_model
    self._init_writer()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 947, in _init_writer
    self.writer = tf_summary.FileWriter(self.log_dir, K.get_session().graph)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/writer.py", line 367, in __init__
    filename_suffix)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/event_file_writer.py", line 67, in __init__
    gfile.MakeDirs(self._logdir)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 442, in recursive_create_dir
    recursive_create_dir_v2(dirname)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 458, in recursive_create_dir_v2
    pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path), status)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: S3 path doesn't contain a bucket name: s3:///mnist/tf_summary
workshop:~/environment/eksworkshop-eksctl $
danheo412 commented 4 years ago

I was told to check the s3 path was set. I set it and I'm getting this error now

workshop:~/environment/eksworkshop-eksctl $ kubectl logs mnist-training -f
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
40960/29515 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
26435584/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
16384/5148 [===============================================================================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
4431872/4422102 [==============================] - 0s 0us/step
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-26 07:17:23.914719: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-11-26 07:17:23.939677: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2019-11-26 07:17:23.939857: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3ee9e00 executing computations on platform Host. Devices:
2019-11-26 07:17:23.939880: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-26 07:17:23.984441: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/config and using profilePrefix = 1
2019-11-26 07:17:23.984475: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/credentials and using profilePrefix = 0
2019-11-26 07:17:23.984489: I tensorflow/core/platform/s3/aws_logging.cc:54] Setting provider to read credentials from /root//.aws/credentials for credentials file and /root//.aws/config for the config file , for use with profile default
2019-11-26 07:17:23.984502: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating HttpClient with max connections2 and scheme http
2019-11-26 07:17:23.984521: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 2
2019-11-26 07:17:23.984536: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating Instance with default EC2MetadataClient and refresh rate 900000
2019-11-26 07:17:23.984555: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:23.984598: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 25
2019-11-26 07:17:23.984650: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:23.984776: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2019-11-26 07:17:23.984793: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.005064: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.005103: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.005273: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.005416: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.019755: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.019789: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.019905: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.020041: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.033487: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.033521: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.033572: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.033664: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.045873: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.045905: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.045955: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.046048: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.058349: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.058382: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.058430: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.058521: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.070959: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.070990: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2019-11-26 07:17:24.071036: I tensorflow/core/platform/s3/aws_logging.cc:54] Found secret key
2019-11-26 07:17:24.071126: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2019-11-26 07:17:24.082978: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 400
2019-11-26 07:17:24.083010: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
Traceback (most recent call last):
  File "mnist.py", line 89, in <module>

train_images.shape: (60000, 28, 28, 1), of float64
test_images.shape: (10000, 28, 28, 1), of float64
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 13, 13, 8)         80        
_________________________________________________________________
flatten (Flatten)            (None, 1352)              0         
_________________________________________________________________
Softmax (Dense)              (None, 10)                13530     
=================================================================
Total params: 13,610
Trainable params: 13,610
Non-trainable params: 0
_________________________________________________________________
    main()
  File "mnist.py", line 82, in main
    model = train(train_images, train_labels, args.epochs, args.model_summary_path)
  File "mnist.py", line 51, in train
    model.fit(train_images, train_labels, epochs=epochs, callbacks=[tensorboard_callback])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 215, in model_iteration
    mode=mode)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 106, in configure_callbacks
    callback_list.set_model(callback_model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 178, in set_model
    callback.set_model(model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 1010, in set_model
    self._init_writer()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 947, in _init_writer
    self.writer = tf_summary.FileWriter(self.log_dir, K.get_session().graph)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/writer.py", line 367, in __init__
    filename_suffix)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/event_file_writer.py", line 67, in __init__
    gfile.MakeDirs(self._logdir)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 442, in recursive_create_dir
    recursive_create_dir_v2(dirname)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 458, in recursive_create_dir_v2
    pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path), status)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: : No response body. Response code: 400
danheo412 commented 4 years ago

Following other posts, I'm posting the content of the kfctl_aws.0.7.0.yaml file:

workshop:~/environment/eksworkshop-eksctl $ cat kfctl_aws.0.7.0.yaml 
apiVersion: kfdef.apps.kubeflow.org/v1beta1
kind: KfDef
metadata:
  creationTimestamp: null
  name: eksworkshop-eksctl
  namespace: kubeflow
spec:
  applications:
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: istio/istio-crds
    name: istio-crds
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: istio/istio-install
    name: istio-install
  - kustomizeConfig:
      parameters:
      - name: clusterRbacConfig
        value: "OFF"
      repoRef:
        name: manifests
        path: istio/istio
    name: istio
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: application/application-crds
    name: application-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: application/application
    name: application
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: metacontroller
    name: metacontroller
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: argo
    name: argo
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: kubeflow-roles
    name: kubeflow-roles
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: common/centraldashboard
    name: centraldashboard
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: admission-webhook/webhook
    name: webhook
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: webhookNamePrefix
        value: admission-webhook-
      repoRef:
        name: manifests
        path: admission-webhook/bootstrap
    name: bootstrap
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: jupyter/jupyter-web-app
    name: jupyter-web-app
  - kustomizeConfig:
      overlays:
      - istio
      repoRef:
        name: manifests
        path: metadata
    name: metadata
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: jupyter/notebook-controller
    name: notebook-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pytorch-job/pytorch-job-crds
    name: pytorch-job-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pytorch-job/pytorch-operator
    name: pytorch-operator
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: usageId
        value: "144553881180253599"
      - name: reportUsage
        value: "true"
      repoRef:
        name: manifests
        path: common/spartakus
    name: spartakus
  - kustomizeConfig:
      overlays:
      - istio
      repoRef:
        name: manifests
        path: tensorboard
    name: tensorboard
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: tf-training/tf-job-crds
    name: tf-job-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: tf-training/tf-job-operator
    name: tf-job-operator
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: katib/katib-crds
    name: katib-crds
  - kustomizeConfig:
      overlays:
      - application
      - istio
      repoRef:
        name: manifests
        path: katib/katib-controller
    name: katib-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/api-service
    name: api-service
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: minioPvcName
        value: minio-pv-claim
      repoRef:
        name: manifests
        path: pipeline/minio
    name: minio
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: mysqlPvcName
        value: mysql-pv-claim
      repoRef:
        name: manifests
        path: pipeline/mysql
    name: mysql
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/persistent-agent
    name: persistent-agent
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-runner
    name: pipelines-runner
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-ui
    name: pipelines-ui
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-viewer
    name: pipelines-viewer
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/scheduledworkflow
    name: scheduledworkflow
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipeline-visualization-service
    name: pipeline-visualization-service
  - kustomizeConfig:
      overlays:
      - application
      - istio
      repoRef:
        name: manifests
        path: profiles
    name: profiles
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: seldon/seldon-core-operator
    name: seldon-core
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: mpi-job/mpi-operator
    name: mpi-operator
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: aws/istio-ingress
    name: istio-ingress
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: clusterName
        value: eksworkshop-eksctl
      repoRef:
        name: manifests
        path: aws/aws-alb-ingress-controller
    name: aws-alb-ingress-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: aws/nvidia-device-plugin
    name: nvidia-device-plugin
  plugins:
  - kind: KfAwsPlugin
    metadata:
      creationTimestamp: null
      name: aws
    spec:
      auth:
        basicAuth:
          password:
            name: password
          username: admin
      region: us-west-2
      roles:
      - eksctl-eksworkshop-eksctl-nodegro-NodeInstanceRole-1HY8SCMLKFYS5
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz
  version: master
status:
  reposCache:
  - localPath: '"/home/ec2-user/environment/eksworkshop-eksctl/.cache/manifests/manifests-0.7-branch"'
    name: manifests
danheo412 commented 4 years ago

I'm also curious what is the best way to debug this issue... like how do I make sense of the error stack

  File "mnist.py", line 82, in main
    model = train(train_images, train_labels, args.epochs, args.model_summary_path)
  File "mnist.py", line 51, in train
    model.fit(train_images, train_labels, epochs=epochs, callbacks=[tensorboard_callback])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 215, in model_iteration
    mode=mode)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 106, in configure_callbacks
    callback_list.set_model(callback_model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 178, in set_model
    callback.set_model(model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 1010, in set_model
    self._init_writer()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/callbacks.py", line 947, in _init_writer
    self.writer = tf_summary.FileWriter(self.log_dir, K.get_session().graph)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/writer.py", line 367, in __init__
    filename_suffix)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/writer/event_file_writer.py", line 67, in __init__
    gfile.MakeDirs(self._logdir)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 442, in recursive_create_dir
    recursive_create_dir_v2(dirname)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 458, in recursive_create_dir_v2
    pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(path), status)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: : No response body. Response code: 400
dalbhanj commented 4 years ago

Hi @danheo412, thanks for the details. This PR fixes the issue you are experiencing: https://github.com/aws-samples/eks-workshop/pull/543/files

PR has been merged, it should show up on main workshop site in a few minutes.

Let me know if the fix resolves the issue

dalbhanj commented 4 years ago

Closing since this is resolved