Slow docker commands with: credsStore and credHelpers

pecigonzalo commented 5 years ago

Similar to the error reported here: https://github.com/docker/cli/issues/1591 seems like the ecr-login helper takes a really long time to timeout finding credentials. I could not find in the SDK any parameter to optimize this workflow, but I think it could be a good idea to "return" faster from the cred search if creds are not found. Unfortunately, the logs do not show in which section is taking so long, but I assume is trying to connect to the introspective API and waiting for a timeout there.

samuelkarp commented 5 years ago

@pecigonzalo Can you tell me a bit more about how you're using the credential helper and how you have it configured? I suspect that delay is because the credential helper is trying to grab credentials from the link-local address 169.254.169.254 (the address used by the EC2 instance metadata service) and the timeout is set too long.

pecigonzalo commented 5 years ago

Samuel, most of it is explained in the linked issue to Docker. I suspect the same thing as I mentioned on the main message, I think it's a timeout on the introspection API (169....).

This is on a workstation, not a server and we don't have the creds all the time, we use aws-vault to only expose credentials as required. I understand this is not covering 80% of the use cases, but it would be ideal.

On Fri, Dec 28, 2018, 8:09 PM Samuel Karp <notifications@github.com wrote:

@pecigonzalo https://github.com/pecigonzalo Can you tell me a bit more about how you're using the credential helper and how you have it configured? I suspect that delay is because the credential helper is trying to grab credentials from the link-local address 169.254.169.254 (the address used by the EC2 instance metadata service) and the timeout is set too long.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-ecr-credential-helper/issues/137#issuecomment-450411493, or mute the thread https://github.com/notifications/unsubscribe-auth/AJcviQwzIdAUUP3rWDjOWBuGBWaFb1aMks5u9mxkgaJpZM4ZkBox .

samuelkarp commented 5 years ago

Do you have credentials in your ~/.aws/credentials, do you pass credentials as environment variables, or something else?

pecigonzalo commented 5 years ago

As mentioned, when the error happens is when there are no credentials. Im not even using an ECR repo, that is the issue. There are 2 parts: A) docker still tries to get credentials for a repo not related to what we are trying to build (docker issue) B) ecr-login takes to long to timeout when there are no credentials present

What I think we should tune is that timeout.

On Fri, Dec 28, 2018, 9:46 PM Samuel Karp <notifications@github.com wrote:

Do you have credentials in your ~/.aws/credentials, do you pass credentials as environment variables, or something else?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-ecr-credential-helper/issues/137#issuecomment-450424483, or mute the thread https://github.com/notifications/unsubscribe-auth/AJcvidZwTfuNFzixXszmk4Z6ODgdfkg2ks5u9oMygaJpZM4ZkBox .

samuelkarp commented 5 years ago

A) docker still tries to get credentials for a repo not related to what we are trying to build (docker issue)

Yep, and that is covered in your issue https://github.com/docker/cli/issues/1591. ~There are a few other issues that cover that scenario as well, but I'm having trouble finding the links right now.~ (Edit: it looks like https://github.com/moby/moby/pull/32967 was an attempt to fix the situation, but got closed without a follow-up.)

B) ecr-login takes to long to timeout when there are no credentials present

When you do provide credentials (since you have the credential helper installed, I'm assuming you do use it at some point), what mechanism do you use to provide them? I'm trying to understand which credential handler in the handler chain is normally used that prevents this from occurring in all cases and only makes the long delay occur when you're not providing credentials.

pecigonzalo commented 5 years ago

@samuelkarp I know my docker issue is covered in the linked issue, that is why I linked, mentioned and referenced it in relation to the docker part of the problem.

Now regarding ecr-login credentials, I use either env vars or introspective API, both provided by https://github.com/99designs/aws-vault most of the time. If you are not familiar with aws-vault it basically stores the keys in some backend, and then you can expose them either as env vars EG: aws-vault exec profile -- command or aws-vault exec -s profile to expose them as introspection (169...).

When providing the credentials most of the time we do it is with env vars tho (either aws-vault or some other way) and it works no problem.

I have also tried the workaround provided in https://github.com/awslabs/amazon-ecr-credential-helper/issues/9 but it did not help with:

{
    "credsStore": "secretservice",
    "credHelpers": {
        "123123.dkr.ecr.eu-central-1.amazonaws.com": "ecr-login"
    },
        "auths": {
        "123123.dkr.ecr.eu-central-1.amazonaws.com": {
        }
    }
}

I believe because of https://github.com/docker/cli/blob/ea836abed5ba9c62c3d4444ea2a6bbf9b486ef1a/cli/command/image/build.go#L386. The moby issue seems related but not 100% on it fixing this problem.

samuelkarp commented 5 years ago

Thanks, so it sounds like you're using both the environment variables credentials handler and the IMDS credentials handler. I want to make sure that changing the timeout here doesn't break your use-case with the IMDS credentials handler.

I'll mark this as a bug and when a fix is available, I'd appreciate it if you can test it.

samuelkarp commented 5 years ago

As a workaround, you can block that particular IP address locally:

sudo route add -host 169.254.169.254 reject

Blocking the IP address should cause the lookup to fail immediately, instead of taking a while to time out.

You can reverse that by running this command:

sudo route del -host 169.254.169.254 reject

pecigonzalo commented 5 years ago

I can certainly test any future patches, and also provide a test setup if that would help.

The blocking the ip range workaround works perfectly, that will certainly help.

Personally, I think most people use either the cred file ~/.aws or env vars for workstation use, but I would think just removing the IMDS (i didnt knew that shortname) would probably break some other use cases, but it might be safe to reduce it to something more sensible as most of the time IMDS responds super fast with this.

To be honest, this would not even be a problem if docker was handling the creds correctly.

samuelkarp commented 5 years ago

Yes, I don't want to remove any of the credential handlers from the chain, but hopefully customize the IMDS provider to shorten the timeout.

maxschae4 commented 2 years ago

This bug also occurs when ecr-login is specified and no AWS_PROFILE environment variable or default ~/.aws/credentials are provided.

My docker build command would hang before sending context at seemingly random times. This sent me down several rabbit holes looking for solutions before I did some stracing and wound up here.

I'm not always doing things that rely on ECR, and certainly don't rely on logging in, so this is really terrible behavior. There must be a better way to establish that we're not running in EC2 (and don't have the instance metadata service available) than trying and failing for a full minute plus.

My workaround: I copied one of my sets of aws credentials with ECR to [default] in ~/.aws/credentials. But you might not want to do this either because it could lead to unexpected behavior for other unrelated AWS things (I happen to have a reasonably safe default). I have previously avoided setting default credentials because AWS actions should be intentional.

ankon commented 1 year ago

I have previously avoided setting default credentials because AWS actions should be intentional.

Hit the same issue (including the rabbit holes), and would agree here: I do not want to set default credentials, there's just too much chance for borking something when you do that.

awslabs / amazon-ecr-credential-helper

Slow docker commands with: credsStore and credHelpers #137