Open akartsky opened 2 years ago
There are the errors that might see on the tensorboard pod when you try to use S3
1] This is caused because we need to specify AWS_REGION
as environment variable for the pod
2022-03-17 19:17:23.774900: W tensorflow/core/platform/s3/aws_logging.cc:57] Encountered Unknown AWSError 'PermanentRedirect': The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
2022-03-17 19:17:23.774947: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 301
2] This is caused because the pod does not have permissions to access the S3 bucket (pod will default to using Node IAM role if no secrets are provided and that will not have S3 access)
2022-03-17 18:51:05.696810: W tensorflow/core/platform/s3/aws_logging.cc:57] Encountered AWSError 'AccessDenied': Access Denied
2022-03-17 18:51:05.696857: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 403
The current implementations of TensorBoard controller does not mount AWS secrets and doesn't have configMap for providing env variable inputs to tensorboard pod
This is Not a good workaround and you have to do this for every tensorboard pod that you launch.
1] Create AWS secrets in the kubeflow user namespace (This IAM user should have S3 access to the bucket) Eg:
apiVersion: v1
kind: Secret
metadata:
name: aws-secret
namespace: <your_kubeflow_user_namespace>
type: Opaque
data:
AWS_ACCESS_KEY_ID: <base_64_key>
AWS_SECRET_ACCESS_KEY: <base_64_secret>
2] Launch a TensorBoard from the UI with S3 object storage link
Eg:
Name : <name_for_your_tensorboard>
Object Storage Link : s3://<your_bucket_name>
Current KF deployment uses TensorBoard version 2.1.0
3] Edit the deployment for the tensorboard pod that was just created
kubectl edit deployment <name_for_your_tensorboard> -n <your_kubeflow_user_namespace>
then add the following environment variables to it (on the same level as args, command and image)
env:
- name: AWS_REGION
value: <your_s3_bucket_region>
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-secret
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-secret
key: AWS_SECRET_ACCESS_KEY
Now if you go to the UI of the tensorboard that you had created it should be working.
Make code changes in the TensorBoard controller
1] Modify the TensorBoard controller and provide a configMap input so that users can specify environment variables 2] Mount AWS credentials just like they are currently doing for GCS
Need to work on this PR
Upstream issue: https://github.com/kubeflow/kubeflow/issues/6493
"Support TensorBoard in Kubeflow Pipelines" section of document is outdated : https://www.kubeflow.org/docs/distributions/aws/pipeline/#support-tensorboard-in-kubeflow-pipelines
Outdated Doc :
TensorBoard needs some extra settings on AWS like below:
Create a Kubernetes secret aws-secret in the kubeflow namespace. Follow instructions here.
Create a ConfigMap to store the configuration of TensorBoard on your cluster. Replace with your S3 region.