argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.93k stars 3.18k forks source link

An error occurred (403) when calling the HeadBucket operation: Forbidden #387

Closed hphsu closed 6 years ago

hphsu commented 6 years ago

I try to install argo on existing AWS kubernetes cluster, but run into following S3 related permission error.

If you like us, show us your support #argoproj
argo cluster ops> argocluster install-argo-only --cluster-name t2.test.eng.applatix.net --cloud-profile dev --cluster-bucket applatixtest3 --kubeconfig /tmp/ax_kube/t2.test.eng.applatix.net --cloud-region us-west-2
2017-10-18T17:35:13 INFO ax.cluster_management.argo_cluster_manager MainThread: Installing Argo platform ...
2017-10-18T17:35:13 INFO ax.cluster_management.argo_cluster_manager MainThread: s3 bucket endpoint: None
2017-10-18T17:35:14 INFO ax.cluster_management.app.options.install_options MainThread: Cloud placement not provided, setting it to us-west-2a from currently available zones ['us-west-2a', 'us-west-2b', 'us-west-2c']
2017-10-18T17:35:14 INFO ax.meta.cluster_id MainThread: Instantiating cluster bucket ...
2017-10-18T17:35:23 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T17:35:32 INFO ax.cluster_management.app.common MainThread: Cannot find cluster name id: An error occurred (403) when calling the HeadBucket operation: Forbidden. Cluster is not yet created.
2017-10-18T17:35:32 INFO ax.meta.cluster_id MainThread: Cluster id not provided, generate one.
2017-10-18T17:35:32 INFO ax.meta.cluster_id MainThread: Created new name-id t2.test.eng.applatix.net-bd98d9c6-b42a-11e7-a626-025000000001
2017-10-18T17:35:32 INFO ax.meta.config_s3_path MainThread: Using AX cluster config path applatixtest3
2017-10-18T17:35:35 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T17:35:38 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T17:35:39 INFO ax.platform.ax_cluster_info MainThread: Downloading cluster current state ...
2017-10-18T17:35:49 ERROR ax.cluster_management.argo_cluster_manager MainThread: An error occurred (403) when calling the HeadBucket operation: Forbidden
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 86, in parse_args_and_run
    getattr(self, cmd)(args)
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 265, in install_argo_only
    PlatformOnlyInstaller(platform_install_config).run()
  File "/ax/python/ax/cluster_management/app/cluster_installer.py", line 514, in __init__
    self._ci_installer = ClusterInstaller(cfg=self._cfg.get_install_config(), kubeconfig=self._cfg.kube_config)
  File "/ax/python/ax/cluster_management/app/cluster_installer.py", line 70, in __init__
    dry_run=self._cfg.dry_run
  File "/ax/python/ax/cluster_management/app/common.py", line 83, in __init__
    self._csm = ClusterStateMachine(cluster_name_id=self._idobj.get_cluster_name_id(), cloud_profile=cloud_profile)
  File "/ax/python/ax/cluster_management/app/state/state.py", line 45, in __init__
    current_state = self._cluster_info.download_cluster_current_state() or ClusterState.UNKNOWN
  File "/ax/python/ax/platform/ax_cluster_info.py", line 260, in download_cluster_current_state
    return self._bucket.get_object(key=self._s3_cluster_current_state)
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 365, in get_object
    if not self.exists():
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 285, in exists
    return self._exists()
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 565, in _exists
    raise ce
ClientError: An error occurred (403) when calling the HeadBucket operation: Forbidden

 !!! Operation failed due to runtime error: An error occurred (403) when calling the HeadBucket operation: Forbidden

Here is minion IAM role policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::applatixtest3/*",
                "arn:aws:s3:::applatix-*",
                "arn:aws:s3:::ax-public",
                "arn:aws:s3:::ax-public/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "ec2:Describe*",
                "ec2:CreateVolume",
                "ec2:DeleteVolume",
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:ReplaceRoute",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:RunInstances",
                "ec2:TerminateInstances",
                "ec2:AssociateAddress",
                "ec2:CreateTags",
                "ec2:CreateSecurityGroup",
                "ec2:DeleteSecurityGroup",
                "ec2:DescribeSecurityGroups"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "route53:*",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:CreateLaunchConfiguration",
                "autoscaling:DeleteLaunchConfiguration",
                "autoscaling:AttachLoadBalancers",
                "autoscaling:DetachLoadBalancers"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "elasticloadbalancing:*",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "sts:AssumeRole",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "iam:GetServerCertificate",
                "iam:DeleteServerCertificate",
                "iam:UploadServerCertificate"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

Here is log when I created kubeternetes and bucket:

Francis-Macbook-Pro:argoKube francis$ cat env.sh
export BUCKET=$1
export KUBE_EDITOR=vim
export AWS_PROFILE=dev
export KOPS_STATE_STORE=s3://${BUCKET}
export AWS_SDK_LOAD_CONFIG=1
aws --profile dev s3 mb s3://${BUCKET}

Francis-Macbook-Pro:argoKube francis$ . env.sh applatixtest3
make_bucket: applatixtest3
Francis-Macbook-Pro:argoKube francis$ bash -x create_kops_cluster.sh t2
+ NAME=t2
+ export AWS_PROFILE=dev
+ AWS_PROFILE=dev
+ hosted_zone=test.eng.applatix.net
+ cluster_name=t2.test.eng.applatix.net
+ kops create cluster --zones=us-west-2c --node-size=m3.large --master-size=m3.large t2.test.eng.applatix.net
I1018 10:04:14.240851    8915 create_cluster.go:659] Inferred --cloud=aws from zone "us-west-2c"
I1018 10:04:14.241365    8915 create_cluster.go:845] Using SSH public key: /Users/francis/.ssh/id_rsa.pub
I1018 10:04:17.183355    8915 subnets.go:183] Assigned CIDR 172.20.32.0/19 to subnet us-west-2c
Previewing changes that will be made:

I1018 10:04:26.045961    8915 executor.go:91] Tasks: 0 done / 63 total; 34 can run
I1018 10:04:28.016706    8915 executor.go:91] Tasks: 34 done / 63 total; 12 can run
I1018 10:04:28.561207    8915 executor.go:91] Tasks: 46 done / 63 total; 15 can run
I1018 10:04:35.988260    8915 executor.go:91] Tasks: 61 done / 63 total; 2 can run
I1018 10:04:36.199585    8915 executor.go:91] Tasks: 63 done / 63 total; 0 can run
Will create resources:
  AutoscalingGroup/master-us-west-2c.masters.t2.test.eng.applatix.net
    MinSize                 1
    MaxSize                 1
    Subnets                 [name:us-west-2c.t2.test.eng.applatix.net]
    Tags                    {k8s.io/role/master: 1, Name: master-us-west-2c.masters.t2.test.eng.applatix.net, KubernetesCluster: t2.test.eng.applatix.net}
    LaunchConfiguration     name:master-us-west-2c.masters.t2.test.eng.applatix.net

  AutoscalingGroup/nodes.t2.test.eng.applatix.net
    MinSize                 2
    MaxSize                 2
    Subnets                 [name:us-west-2c.t2.test.eng.applatix.net]
    Tags                    {k8s.io/role/node: 1, Name: nodes.t2.test.eng.applatix.net, KubernetesCluster: t2.test.eng.applatix.net}
    LaunchConfiguration     name:nodes.t2.test.eng.applatix.net

  DHCPOptions/t2.test.eng.applatix.net
    DomainName              us-west-2.compute.internal
    DomainNameServers       AmazonProvidedDNS

  EBSVolume/c.etcd-events.t2.test.eng.applatix.net
    AvailabilityZone        us-west-2c
    VolumeType              gp2
    SizeGB                  20
    Encrypted               false
    Tags                    {KubernetesCluster: t2.test.eng.applatix.net, k8s.io/etcd/events: c/c, k8s.io/role/master: 1, Name: c.etcd-events.t2.test.eng.applatix.net}

  EBSVolume/c.etcd-main.t2.test.eng.applatix.net
    AvailabilityZone        us-west-2c
    VolumeType              gp2
    SizeGB                  20
    Encrypted               false
    Tags                    {k8s.io/etcd/main: c/c, k8s.io/role/master: 1, Name: c.etcd-main.t2.test.eng.applatix.net, KubernetesCluster: t2.test.eng.applatix.net}

  IAMInstanceProfile/masters.t2.test.eng.applatix.net

  IAMInstanceProfile/nodes.t2.test.eng.applatix.net

  IAMInstanceProfileRole/masters.t2.test.eng.applatix.net
    InstanceProfile         name:masters.t2.test.eng.applatix.net id:masters.t2.test.eng.applatix.net
    Role                    name:masters.t2.test.eng.applatix.net

  IAMInstanceProfileRole/nodes.t2.test.eng.applatix.net
    InstanceProfile         name:nodes.t2.test.eng.applatix.net id:nodes.t2.test.eng.applatix.net
    Role                    name:nodes.t2.test.eng.applatix.net

  IAMRole/masters.t2.test.eng.applatix.net
    ExportWithID            masters

  IAMRole/nodes.t2.test.eng.applatix.net
    ExportWithID            nodes

  IAMRolePolicy/masters.t2.test.eng.applatix.net
    Role                    name:masters.t2.test.eng.applatix.net

  IAMRolePolicy/nodes.t2.test.eng.applatix.net
    Role                    name:nodes.t2.test.eng.applatix.net

  InternetGateway/t2.test.eng.applatix.net
    VPC                     name:t2.test.eng.applatix.net
    Shared                  false

  Keypair/kops
    Subject                 o=system:masters,cn=kops
    Type                    client

  Keypair/kube-controller-manager
    Subject                 cn=system:kube-controller-manager
    Type                    client

  Keypair/kube-proxy
    Subject                 cn=system:kube-proxy
    Type                    client

  Keypair/kube-scheduler
    Subject                 cn=system:kube-scheduler
    Type                    client

  Keypair/kubecfg
    Subject                 o=system:masters,cn=kubecfg
    Type                    client

  Keypair/kubelet
    Subject                 o=system:nodes,cn=kubelet
    Type                    client

  Keypair/master
    Subject                 cn=kubernetes-master
    Type                    server
    AlternateNames          [100.64.0.1, 127.0.0.1, api.internal.t2.test.eng.applatix.net, api.t2.test.eng.applatix.net, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local]

  LaunchConfiguration/master-us-west-2c.masters.t2.test.eng.applatix.net
    ImageID                 kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
    InstanceType            m3.large
    SSHKey                  name:kubernetes.t2.test.eng.applatix.net-86:e0:db:f4:d5:da:1e:f6:24:aa:9e:d7:2d:fb:77:3a id:kubernetes.t2.test.eng.applatix.net-86:e0:db:f4:d5:da:1e:f6:24:aa:9e:d7:2d:fb:77:3a
    SecurityGroups          [name:masters.t2.test.eng.applatix.net]
    AssociatePublicIP       true
    IAMInstanceProfile      name:masters.t2.test.eng.applatix.net id:masters.t2.test.eng.applatix.net
    RootVolumeSize          64
    RootVolumeType          gp2
    SpotPrice

  LaunchConfiguration/nodes.t2.test.eng.applatix.net
    ImageID                 kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
    InstanceType            m3.large
    SSHKey                  name:kubernetes.t2.test.eng.applatix.net-86:e0:db:f4:d5:da:1e:f6:24:aa:9e:d7:2d:fb:77:3a id:kubernetes.t2.test.eng.applatix.net-86:e0:db:f4:d5:da:1e:f6:24:aa:9e:d7:2d:fb:77:3a
    SecurityGroups          [name:nodes.t2.test.eng.applatix.net]
    AssociatePublicIP       true
    IAMInstanceProfile      name:nodes.t2.test.eng.applatix.net id:nodes.t2.test.eng.applatix.net
    RootVolumeSize          128
    RootVolumeType          gp2
    SpotPrice

  ManagedFile/t2.test.eng.applatix.net-addons-bootstrap
    Location                addons/bootstrap-channel.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-core.addons.k8s.io
    Location                addons/core.addons.k8s.io/v1.4.0.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-dns-controller.addons.k8s.io-k8s-1.6
    Location                addons/dns-controller.addons.k8s.io/k8s-1.6.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-dns-controller.addons.k8s.io-pre-k8s-1.6
    Location                addons/dns-controller.addons.k8s.io/pre-k8s-1.6.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-kube-dns.addons.k8s.io-k8s-1.6
    Location                addons/kube-dns.addons.k8s.io/k8s-1.6.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-kube-dns.addons.k8s.io-pre-k8s-1.6
    Location                addons/kube-dns.addons.k8s.io/pre-k8s-1.6.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-limit-range.addons.k8s.io
    Location                addons/limit-range.addons.k8s.io/v1.5.0.yaml

  ManagedFile/t2.test.eng.applatix.net-addons-storage-aws.addons.k8s.io
    Location                addons/storage-aws.addons.k8s.io/v1.6.0.yaml

  Route/0.0.0.0/0
    RouteTable              name:t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    InternetGateway         name:t2.test.eng.applatix.net

  RouteTable/t2.test.eng.applatix.net
    VPC                     name:t2.test.eng.applatix.net

  RouteTableAssociation/us-west-2c.t2.test.eng.applatix.net
    RouteTable              name:t2.test.eng.applatix.net
    Subnet                  name:us-west-2c.t2.test.eng.applatix.net

  SSHKey/kubernetes.t2.test.eng.applatix.net-86:e0:db:f4:d5:da:1e:f6:24:aa:9e:d7:2d:fb:77:3a
    KeyFingerprint          e7:9b:b3:45:57:d7:45:0e:bf:50:5a:0a:bc:5e:31:4a

  Secret/admin

  Secret/kube

  Secret/kube-proxy

  Secret/kubelet

  Secret/system:controller_manager

  Secret/system:dns

  Secret/system:logging

  Secret/system:monitoring

  Secret/system:scheduler

  SecurityGroup/masters.t2.test.eng.applatix.net
    Description             Security group for masters
    VPC                     name:t2.test.eng.applatix.net
    RemoveExtraRules        [port=22, port=443, port=4001, port=4789, port=179]

  SecurityGroup/nodes.t2.test.eng.applatix.net
    Description             Security group for nodes
    VPC                     name:t2.test.eng.applatix.net
    RemoveExtraRules        [port=22]

  SecurityGroupRule/all-master-to-master
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    SourceGroup             name:masters.t2.test.eng.applatix.net

  SecurityGroupRule/all-master-to-node
    SecurityGroup           name:nodes.t2.test.eng.applatix.net
    SourceGroup             name:masters.t2.test.eng.applatix.net

  SecurityGroupRule/all-node-to-node
    SecurityGroup           name:nodes.t2.test.eng.applatix.net
    SourceGroup             name:nodes.t2.test.eng.applatix.net

  SecurityGroupRule/https-external-to-master-0.0.0.0/0
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    Protocol                tcp
    FromPort                443
    ToPort                  443

  SecurityGroupRule/master-egress
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    Egress                  true

  SecurityGroupRule/node-egress
    SecurityGroup           name:nodes.t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    Egress                  true

  SecurityGroupRule/node-to-master-tcp-1-4000
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    Protocol                tcp
    FromPort                1
    ToPort                  4000
    SourceGroup             name:nodes.t2.test.eng.applatix.net

  SecurityGroupRule/node-to-master-tcp-4003-65535
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    Protocol                tcp
    FromPort                4003
    ToPort                  65535
    SourceGroup             name:nodes.t2.test.eng.applatix.net

  SecurityGroupRule/node-to-master-udp-1-65535
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    Protocol                udp
    FromPort                1
    ToPort                  65535
    SourceGroup             name:nodes.t2.test.eng.applatix.net

  SecurityGroupRule/ssh-external-to-master-0.0.0.0/0
    SecurityGroup           name:masters.t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    Protocol                tcp
    FromPort                22
    ToPort                  22

  SecurityGroupRule/ssh-external-to-node-0.0.0.0/0
    SecurityGroup           name:nodes.t2.test.eng.applatix.net
    CIDR                    0.0.0.0/0
    Protocol                tcp
    FromPort                22
    ToPort                  22

  Subnet/us-west-2c.t2.test.eng.applatix.net
    VPC                     name:t2.test.eng.applatix.net
    AvailabilityZone        us-west-2c
    CIDR                    172.20.32.0/19
    Shared                  false
    Tags                    {kubernetes.io/cluster/t2.test.eng.applatix.net: owned, KubernetesCluster: t2.test.eng.applatix.net, Name: us-west-2c.t2.test.eng.applatix.net}
apiVersion: kops/v1alpha2

apiVersion: kops/v1alpha2
  VPC/t2.test.eng.applatix.net
apiVersion: kops/v1alpha2
    CIDR                    172.20.0.0/16
    EnableDNSHostnames      true
    EnableDNSSupport        true
    Shared                  false
    Tags                    {KubernetesCluster: t2.test.eng.applatix.net, Name: t2.test.eng.applatix.net, kubernetes.io/cluster/t2.test.eng.applatix.net: owned}

  VPCDHCPOptionsAssociation/t2.test.eng.applatix.net
    VPC                     name:t2.test.eng.applatix.net
    DHCPOptions             name:t2.test.eng.applatix.net

Must specify --yes to apply changes

Cluster configuration has been created.

Suggestions:
 * list clusters with: kops get cluster
 * edit this cluster with: kops edit cluster t2.test.eng.applatix.net
 * edit your node instance group: kops edit ig --name=t2.test.eng.applatix.net nodes
 * edit your master instance group: kops edit ig --name=t2.test.eng.applatix.net master-us-west-2c

Finally configure your cluster with: kops update cluster t2.test.eng.applatix.net --yes

+ kops create ig --subnet us-west-2c --name=t2.test.eng.applatix.net morenodes
+ kops edit ig --name=t2.test.eng.applatix.net nodes
+ kops edit ig --name=t2.test.eng.applatix.net morenodes
+ kops update cluster t2.test.eng.applatix.net --yes
I1018 10:05:53.141605    8964 executor.go:91] Tasks: 0 done / 65 total; 34 can run
I1018 10:05:54.660499    8964 vfs_castore.go:422] Issuing new certificate: "kube-scheduler"
I1018 10:05:54.768046    8964 vfs_castore.go:422] Issuing new certificate: "kubecfg"
I1018 10:05:54.914255    8964 vfs_castore.go:422] Issuing new certificate: "kops"
I1018 10:05:55.102562    8964 vfs_castore.go:422] Issuing new certificate: "kubelet"
I1018 10:05:55.385285    8964 vfs_castore.go:422] Issuing new certificate: "kube-proxy"
I1018 10:05:55.573671    8964 vfs_castore.go:422] Issuing new certificate: "kube-controller-manager"
I1018 10:05:55.598097    8964 vfs_castore.go:422] Issuing new certificate: "master"
I1018 10:05:58.754347    8964 executor.go:91] Tasks: 34 done / 65 total; 12 can run
I1018 10:06:00.906054    8964 executor.go:91] Tasks: 46 done / 65 total; 16 can run
I1018 10:06:07.524301    8964 launchconfiguration.go:327] waiting for IAM instance profile "masters.t2.test.eng.applatix.net" to be ready
I1018 10:06:07.685986    8964 launchconfiguration.go:327] waiting for IAM instance profile "nodes.t2.test.eng.applatix.net" to be ready
I1018 10:06:08.454738    8964 launchconfiguration.go:327] waiting for IAM instance profile "nodes.t2.test.eng.applatix.net" to be ready
I1018 10:06:19.125192    8964 executor.go:91] Tasks: 62 done / 65 total; 3 can run
I1018 10:06:20.296012    8964 executor.go:91] Tasks: 65 done / 65 total; 0 can run
I1018 10:06:20.297333    8964 dns.go:152] Pre-creating DNS records
I1018 10:06:24.066678    8964 update_cluster.go:247] Exporting kubecfg for cluster
Kops has set your kubectl context to t2.test.eng.applatix.net

Cluster is starting.  It should be ready in a few minutes.

Suggestions:
 * validate cluster: kops validate cluster
 * list nodes: kubectl get nodes --show-labels
 * ssh to the master: ssh -i ~/.ssh/id_rsa admin@api.t2.test.eng.applatix.net
The admin user is specific to Debian. If not using Debian please use the appropriate user based on your OS.
 * read about installing addons: https://github.com/kubernetes/kops/blob/master/docs/addons.md

+ export KUBECONFIG=/Users/francis/.kube/t2.test.eng.applatix.net
+ KUBECONFIG=/Users/francis/.kube/t2.test.eng.applatix.net
+ kops export kubecfg --name t2.test.eng.applatix.net '--config=~/Users/francis/.kube/t2.test.eng.applatix.net'
Kops has set your kubectl context to t2.test.eng.applatix.net
shrinandj commented 6 years ago

I created a cluster using kops and tried installing Argo using the same bucket (applatixtest3). It seems to work just fine... Will dig deeper.

shrinandj commented 6 years ago

Francis found that detecting the region of the cluster-bucket is different when he tries it vs when I tried. For me, the installer correctly detected the region of the bucket as "us-west-2" whereas for him, it was detected as "None".

francis-ax commented 6 years ago

double check the "aws configure" also show us-west-2 as my default region, not sure where does installation pickup None as region?


argo cluster ops> aws configure list
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************7VJQ shared-credentials-file
secret_key     ****************7c7B shared-credentials-file
    region                us-west-2      config-file    ~/.aws/config
argo cluster ops> argocluster install-argo-only --cloud-region us-west-2 --cluster-name t2.test.eng.applatix.net --cloud-provider aws --cloud-profile dev --cluster-bucket applatixtest3 --kubeconfig /tmp/ax_kube/config
2017-10-18T22:42:06 INFO ax.cluster_management.argo_cluster_manager MainThread: Installing Argo platform ...
2017-10-18T22:42:06 INFO ax.cluster_management.argo_cluster_manager MainThread: s3 bucket endpoint: None
2017-10-18T22:42:07 INFO ax.cluster_management.app.options.install_options MainThread: Cloud placement not provided, setting it to us-west-2a from currently available zones ['us-west-2a', 'us-west-2b', 'us-west-2c']
2017-10-18T22:42:07 INFO ax.meta.cluster_id MainThread: Instantiating cluster bucket ...
2017-10-18T22:42:15 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T22:42:24 INFO ax.cluster_management.app.common MainThread: Cannot find cluster name id: An error occurred (403) when calling the HeadBucket operation: Forbidden. Cluster is not yet created.
2017-10-18T22:42:24 INFO ax.meta.cluster_id MainThread: Cluster id not provided, generate one.
2017-10-18T22:42:24 INFO ax.meta.cluster_id MainThread: Created new name-id t2.test.eng.applatix.net-9c2aa9ec-b455-11e7-af2d-025000000001
2017-10-18T22:42:24 INFO ax.meta.config_s3_path MainThread: Using AX cluster config path applatixtest3
2017-10-18T22:42:27 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T22:42:31 INFO ax.cloud.aws.aws_s3 MainThread: Using region None for bucket applatixtest3
2017-10-18T22:42:31 INFO ax.platform.ax_cluster_info MainThread: Downloading cluster current state ...
2017-10-18T22:42:40 ERROR ax.cluster_management.argo_cluster_manager MainThread: An error occurred (403) when calling the HeadBucket operation: Forbidden
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 86, in parse_args_and_run
    getattr(self, cmd)(args)
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 265, in install_argo_only
    PlatformOnlyInstaller(platform_install_config).run()
  File "/ax/python/ax/cluster_management/app/cluster_installer.py", line 514, in __init__
    self._ci_installer = ClusterInstaller(cfg=self._cfg.get_install_config(), kubeconfig=self._cfg.kube_config)
  File "/ax/python/ax/cluster_management/app/cluster_installer.py", line 70, in __init__
    dry_run=self._cfg.dry_run
  File "/ax/python/ax/cluster_management/app/common.py", line 83, in __init__
    self._csm = ClusterStateMachine(cluster_name_id=self._idobj.get_cluster_name_id(), cloud_profile=cloud_profile)
  File "/ax/python/ax/cluster_management/app/state/state.py", line 45, in __init__
    current_state = self._cluster_info.download_cluster_current_state() or ClusterState.UNKNOWN
  File "/ax/python/ax/platform/ax_cluster_info.py", line 260, in download_cluster_current_state
    return self._bucket.get_object(key=self._s3_cluster_current_state)
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 365, in get_object
    if not self.exists():
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 285, in exists
    return self._exists()
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/ax/python/ax/cloud/aws/aws_s3.py", line 565, in _exists
    raise ce
ClientError: An error occurred (403) when calling the HeadBucket operation: Forbidden

 !!! Operation failed due to runtime error: An error occurred (403) when calling the HeadBucket operation: Forbidden
shrinandj commented 6 years ago

I think I know what's wrong here! The installer expects an AWS profile called “default” and expects this profile to have enough privileges to query AWS (s3 specifically).. I have the “default” profile .. where this works. But you’re default profile probably doesn’t have this.

I’ll add a “--cloud-profile” option to override this…

shrinandj commented 6 years ago

This should work now that 64c2698 has been checked in.