deis / builder

Git server and application builder for Deis Workflow
https://deis.com
MIT License
40 stars 41 forks source link

cannot use awsIAM roles instead of aws accessKey #486

Open Akshaykapoor opened 7 years ago

Akshaykapoor commented 7 years ago

I upgraded my cluster from workflow v2.10.0 to 2.11.0. For this upgrade i changed storage backend to be off-cluster on s3.

My values.yaml looks something like below. I've also given full S3 access to the nodes. Nothing failed during installation, except that my registry, builder components are in CrashLoopBackoff with the following erros,

registry-logs

2017/02/16 13:58:53 INFO: Starting registry...
2017/02/16 13:58:53 INFO: using s3 as the backend
2017/02/16 13:58:53 open /var/run/secrets/deis/registry/creds/accesskey: no such file or directory

Builder-logs

2017/02/16 13:58:29 Running in debug mode
2017/02/16 13:58:29 Error creating storage driver (AccessDenied: Access Denied
    status code: 403, request id: DEB87202BB385735)

Is there a way that i can explicitly tell to not use accessKey and secretKey when installing in values.yaml file.

The yaml file mentions, if you leave it blank it will use IAM roles. I'm not sure it's using the IAM roles because the registry-logs seems to open the dir for creds.

Is it that i'm missing something, or the only way to go about this is to provide accessKey and secretKey

values.yaml

# This is the global configuration file for Workflow

global:
  # Set the storage backend
  #
  # Valid values are:
  # - s3: Store persistent data in AWS S3 (configure in S3 section)
  # - azure: Store persistent data in Azure's object storage
  # - gcs: Store persistent data in Google Cloud Storage
  # - minio: Store persistent data on in-cluster Minio server
  storage: s3

  # Set the location of Workflow's PostgreSQL database
  #
  # Valid values are:
  # - on-cluster: Run PostgreSQL within the Kubernetes cluster (credentials are generated
  #   automatically; backups are sent to object storage
  #   configured above)
  # - off-cluster: Run PostgreSQL outside the Kubernetes cluster (configure in database section)
  database_location: "off-cluster"

  # Set the location of Workflow's logger-specific Redis instance
  #
  # Valid values are:
  # - on-cluster: Run Redis within the Kubernetes cluster
  # - off-cluster: Run Redis outside the Kubernetes cluster (configure in loggerRedis section)
  logger_redis_location: "on-cluster"

  # Set the location of Workflow's influxdb cluster
  #
  # Valid values are:
  # - on-cluster: Run Influxdb within the Kubernetes cluster
  # - off-cluster: Influxdb is running outside of the cluster and credentials and connection information will be provided.
  influxdb_location: "on-cluster"
  # Set the location of Workflow's grafana instance
  #
  # Valid values are:
  # - on-cluster: Run Grafana within the Kubernetes cluster
  # - off-cluster: Grafana is running outside of the cluster
  grafana_location: "on-cluster"

  # Set the location of Workflow's Registry
  #
  # Valid values are:
  # - on-cluster: Run registry within the Kubernetes cluster
  # - off-cluster: Use registry outside the Kubernetes cluster (example: dockerhub,quay.io,self-hosted)
  # - ecr: Use Amazon's ECR
  # - gcr: Use Google's GCR
  registry_location: "on-cluster"
  # The host port to which registry proxy binds to
  host_port: 5555
  # Prefix for the imagepull secret created when using private registry
  secret_prefix: "private-registry"

s3:
  # Your AWS access key. Leave it empty if you want to use IAM credentials.
  accesskey: ""
  # Your AWS secret key. Leave it empty if you want to use IAM credentials.
  secretkey: ""
  # Any S3 region
  region: "us-east-1"
  # Your buckets.
  registry_bucket: "REDACTED"
  database_bucket: "REDACTED"
  builder_bucket: "REDACTED"

azure:
  accountname: "YOUR ACCOUNT NAME"
  accountkey: "YOUR ACCOUNT KEY"
  registry_container: "your-registry-container-name"
  database_container: "your-database-container-name"
  builder_container: "your-builder-container-name"

gcs:
  # key_json is expanded into a JSON file on the remote server. It must be
  # well-formatted JSON data.
  key_json: <base64-encoded JSON data>
  registry_bucket: "your-registry-bucket-name"
  database_bucket: "your-database-bucket-name"
  builder_bucket: "your-builder-bucket-name"

swift:
  username: "Your OpenStack Swift Username"
  password: "Your OpenStack Swift Password"
  authurl: "Swift auth URL for obtaining an auth token"
  # Your OpenStack tenant name if you are using auth version 2 or 3.
  tenant: ""
  authversion: "Your OpenStack swift auth version"
  registry_container: "your-registry-container-name"
  database_container: "your-database-container-name"
  builder_container: "your-builder-container-name"

# Set the default (global) way of how Application (your own) images are
# pulled from within the Controller.
# This can be configured per Application as well in the Controller.
#
# This affects pull apps and git push (slugrunner images) apps
#
# Values values are:
# - Always
# - IfNotPresent
controller:
  app_pull_policy: "IfNotPresent"
  # Possible values are:
  # enabled - allows for open registration
  # disabled - turns off open registration
  # admin_only - allows for registration by an admin only.
  registration_mode: "enabled"

database:
  # The username and password to be used by the on-cluster database.
  # If left empty they will be generated using randAlphaNum
  username: ""
  password: ""
  # Configure the following ONLY if using an off-cluster PostgreSQL database
  postgres:
    name: "database name"
    username: "database username"
    password: "database password"
    host: "database host"
    port: "database port"

redis:
  # Configure the following ONLY if using an off-cluster Redis instance for logger
  db: "0"
  host: "redis host"
  port: "redis port"
  password: "redis password" # "" == no password

fluentd:
  syslog:
    # Configure the following ONLY if using Fluentd to send log messages to both
    # the Logger component and external syslog endpoint
    # external syslog endpoint url
    host: ""
    # external syslog endpoint port
    port: ""

monitor:
  grafana:
    user: "admin"
    password: "admin"
  # Configure the following ONLY if using an off-cluster Influx database
  influxdb:
    url: "my.influx.url"
    database: "kubernetes"
    user: "user"
    password: "password"

registry-token-refresher:
  # Time in minutes after which the token should be refreshed.
  # Leave it empty to use the default provider time.
  token_refresh_time: ""
  off_cluster_registry:
    hostname: ""
    organization: ""
    username: ""
    password: ""
  ecr:
    # Your AWS access key. Leave it empty if you want to use IAM credentials.
    accesskey: ""
    # Your AWS secret key. Leave it empty if you want to use IAM credentials.
    secretkey: ""
    # Any S3 region
    region: "us-west-2"
    registryid: ""
    hostname: ""
  gcr:
    key_json: <base64-encoded JSON data>
    hostname: ""

router:
  dhparam: ""
  # Any custom router annotations(https://github.com/deis/router#annotations)
  # which need to be applied can be specified as key-value pairs under "deployment_annotations"
  deployment_annotations:
    #<example-key>: <example-value>

  # Any custom annotations for k8s services like http://kubernetes.io/docs/user-guide/services/#ssl-support-on-aws
  # which need to be applied can be specified as key-value pairs under "service_annotations"
  service_annotations:
    #<example-key>: <example-value>

  # Enable to pin router pod hostPort when using minikube or vagrant
  host_port:
    enabled: true

  # Service type default to LoadBalancer
  # service_type: LoadBalancer

workflow-manager:
  versions_api_url: https://versions.deis.com
  doctor_api_url: https://doctor.deis.com
vdice commented 7 years ago

The yaml file mentions, if you leave it blank it will use IAM roles. I'm not sure it's using the IAM roles because the registry-logs seems to open the dir for creds

I don't have direct experience with failures relating to IAM roles, but my first guess is the issue may lie in IAM role(s)/permissions set up/used by the cluster... Perhaps double-check that using the same role(s)/perms works properly elsewhere (say via aws invocations in a terminal).

blakebarnett commented 7 years ago

I ran into this also, I realized it's failing during the build because builder generates a pod spec that doesn't have the IAM role annotation. The dockerbuilder/slugbuilder podspec would need to inherit the AWS role annotation use for kube2iam.

Maybe something like this would work, though it would need to be conditional, etc:

index 8418cc6..d68771f 100644
--- a/pkg/gitreceive/k8s_util.go
+++ b/pkg/gitreceive/k8s_util.go
@@ -28,6 +28,7 @@ const (
        builderStorage   = "BUILDER_STORAGE"
        objectStorePath  = "/var/run/secrets/deis/objectstore/creds"
        envRoot          = "/tmp/env"
+       iamRole          = "IAM_ROLE"
 )

 func dockerBuilderPodName(appName, shortSha string) string {
@@ -166,6 +167,9 @@ func buildPod(
                        Labels: map[string]string{
                                "heritage": name,
                        },
+                       Annotations: map[string]string{
+                               "iam.amazonaws.com/role": iamRole,
+                       },
                },
        }
Cryptophobia commented 7 years ago

@Akshaykapoor @blakebarnett @vdice Any news on this? We are experiencing the same problem 403 Access Denied but the IAM policies are correct. Seems like deis-builder is not respecting the blank access and secret keys and trying to use them.

blakebarnett commented 7 years ago

Sorry, we had to stop using deis workflow because of this and a few other reasons.

Cryptophobia commented 7 years ago

@blakebarnett, may I ask what the other reasons were?

Just interested in learning what else you ran into that was a deal breaker for you and your team.

blakebarnett commented 7 years ago

Here's the non-sugar-coated list.

Cryptophobia commented 7 years ago

Thank you for the list @blakebarnett . We were aware with most of these. Agreed that it would be nice to have kube2iam integration for the pods and RBAC is being added in the next version it seems like.

The single app per-namespace is annoying as well.

What did you guys end up using instead or did you just go with some kind of custom kubernetes setup?

blakebarnett commented 7 years ago

We're just building everything using CI and Helm charts for now, in hope that at some later point everything will play nicely and we can provide PaaS features.

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/builder#18