aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.1k stars 4.01k forks source link

EKS authentication slower with aws-cli than aws-iam-authenticator #5570

Open emaincourt opened 3 years ago

emaincourt commented 3 years ago

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug When using the aws-cli, it takes around 5 seconds to get a proper token to communicate with our EKS clusters. Performing the same action with aws-iam-authenticator takes less than a second.

SDK version number aws-cli/2.0.50 aws-iam-authenticator/0.5.1

Platform/OS/Hardware/Device Both MacOS and Ubuntu

To Reproduce (observed behavior)

time aws-iam-authenticator token -i eks-staging                                                                                                                                                                                         (arn:aws:eks:eu-west-3:<REDACTED>:cluster/eks-staging/default)
{"kind":"ExecCredential","apiVersion":"client.authentication.k8s.io/v1alpha1","spec":{},"status":{"expirationTimestamp":"2020-09-24T13:53:42Z","token":"<REDACTED>"}}
aws-iam-authenticator token -i eks-staging  0.09s user 0.05s system 14% cpu 0.966 total

time aws eks get-token --cluster-name eks-staging                                                                                                                                                                                       (arn:aws:eks:eu-west-3:<REDACTED>:cluster/eks-staging/default)
{"kind": "ExecCredential", "apiVersion": "client.authentication.k8s.io/v1alpha1", "spec": {}, "status": {"expirationTimestamp": "2020-09-24T14:03:47Z", "token": "<REDACTED>"}}
aws eks get-token --cluster-name eks-staging  0.71s user 0.23s system 21% cpu 4.347 total

Expected behavior Both should respond in the same time.

Logs/output

Additional context

kdaily commented 3 years ago

Hi @emaincourt, thanks for reporting.

This sounds like an enhancement, not a bug. We would have to investigate what the implementation of aws-iam-authenticator does.

Have you tested this in AWS CLI v1 to see if this is a change introduced in the AWS CLI v2?

nielsole commented 3 years ago
$ aws eks get-token --cluster-name [...]
real    0m2,565s
user    0m0,515s
sys 0m0,049s

$ time aws-iam-authenticator token -i [...]
real    0m0,019s
user    0m0,014s
sys 0m0,007s

$ aws --version
aws-cli/2.0.59 Python/3.7.3 Linux/5.8.16-050816-generic exe/x86_64.ubuntu.20

Just from the timing it looks like aws CLI is fetching a new token, while aws-iam-authenticator is just reusing my existing one.

Adding 2.5 seconds on top of every kubectl call is unfeasible for my development needs. I would plead for treating this like a bug, as it is advertised as a drop in replacement for aws-iam-authenticator "you don't need to install the authenticator. Instead, you can use the aws eks get-token command"

nielsole commented 3 years ago

Looks like it tries to contact metadata server twice with a 1 second timeout despite AWS_PROFILE being set and ~/.aws/config and ~/.aws/credentials being there.

[pid 23847] connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("169.254.169.254")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 23847] clock_gettime(CLOCK_MONOTONIC, {tv_sec=13203, tv_nsec=435005554}) = 0
[pid 23847] poll([{fd=5, events=POLLOUT|POLLERR}], 1, 1000

) = 0 (Timeout)
[pid 23847] close(5)                    = 0
[pid 23847] clock_gettime(CLOCK_MONOTONIC, {tv_sec=13204, tv_nsec=437736837}) = 0
[pid 23847] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
[pid 23847] ioctl(5, FIOCLEX)           = 0
[pid 23847] ioctl(5, FIONBIO, [1])      = 0
[pid 23847] connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("169.254.169.254")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 23847] clock_gettime(CLOCK_MONOTONIC, {tv_sec=13204, tv_nsec=438882515}) = 0
[pid 23847] poll([{fd=5, events=POLLOUT|POLLERR}], 1, 1000

) = 0 (Timeout)
[pid 23847] close(5)   
nielsole commented 3 years ago

Setting the AWS_EC2_METADATA_DISABLED=true (undocumented?) from https://github.com/aws/aws-cli/issues/5623 reduces query aws eks get-token to ~500ms (which is still 25x slower, but not 125x slower ;) )

stephank commented 3 years ago

The metadata server stuff is interesting. The other major factor is just loading all of the Python code, I think. I time ~0.85s for aws eks get-token ..., compared to ~0.75s for just aws (which only prints usage). A bare print('hello') in my Python is ~0.05s. Solving this sounds tricky, but I don't know Python well, maybe there are some magic ways to tune this?

Maybe another solution is to add a flag to aws eks update-kubeconfig to have it generate a config that uses aws-iam-authenticator? Or perhaps even default to it, if it is found in $PATH?

chris-oconnor-plurilock commented 2 years ago

I also have the exact same problem. We have just moved over to aws sso so we are forced to use aws eks get-token if we use aws sso to gain cli access to the eks cluster. So I would suggest this is actually a regression since i can no longer use aws-iam-authenticator. The speed difference is very annoying

dhensen commented 2 years ago

I ended up here after investigating why my pytest took so long to initialize. First I thought it was WSL on a corporate laptop, but it was as slow on my own 5950X. It turns out calls to that IP address mentioned in comments above add 8 to 10 seconds to my pytest initialization. I'm using boto3. boto3 does stuff with aws-cli. So I'm still not 100% sure, but at least adding AWS_EC2_METADATA_DISABLED=true gives some relieve. I do wonder what I am now missing out on...?

aogier commented 2 years ago

I'm facing the same problem and I see that, for reasons beyond my comprehension, apt-cache policy gets called at every aws invocation, including get-token:

strace -fe file aws ecr describe-repositories 2>&1 | grep apt-cache
[pid 213326] execve("/home/oggei/.pyenv/shims/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.pyenv/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.zinit/polaris/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.poetry/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.local/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.poetry/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.cargo/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/home/oggei/.local/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/usr/local/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = -1 ENOENT (No such file or directory)
[pid 213326] execve("/usr/bin/apt-cache", ["apt-cache", "policy"], 0x7f1c6ee78210 /* 58 vars */) = 0

apt-cache policy is indeed slow:

time apt-cache policy > /dev/null
apt-cache policy > /dev/null  5.18s user 0.23s system 100% cpu 5.409 total

and so is any kubectl command:

time k get no > /dev/null
apt-cache policy > /dev/null  5.56s user 0.23s system 100% cpu 5.710 total