cloudyr / aws.signature

Amazon Web Services Request Signatures
https://cloud.r-project.org/package=aws.signature
31 stars 33 forks source link

locate_credentials() is slow #33

Open leeper opened 6 years ago

leeper commented 6 years ago

Example from https://github.com/cloudyr/aws.s3/issues/166

Report from Vitalina Komashko:

system.time(aws.s3::copy_object(from_object = from_object[1], to_object = to_object[1],
from_bucket = from_bucket, to_bucket = to_bucket, verbose = TRUE))
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key

Checking bucket region using get_location('<REDACTED>')
Executing request using bucket region us-west-1
S3 Request URL: https://s3-us-west-1.amazonaws.com/<REDACTED>
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK
    user   system  elapsed
   0.836    0.674 2437.777

This runs about 10x slower than the AWS CLI. Issue may be with locate_credentials():

/usr/bin/time -l aws s3 cp s3://<REDACTED> --profile corporate
copy: s3://<REDACTED> to s3://<REDACTED>

      263.67 real         2.24 user         0.65 sys
leeper commented 6 years ago

Ideas for refactoring:

leeper commented 6 years ago

We also get a decent improvement on signature_v4_auth() from v0.4.4:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo")))
+ )
# A tibble: 1 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 15.1ms 23.9ms 25.5ms 35.6ms      41.8    35.1KB     1    20      478ms <lgl ~ <Rpro~ <bch~ <tib~

to v0.4.5:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"))),
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"), force_credentials = TRUE))
+ )
# A tibble: 2 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 2.52ms 2.85ms 2.79ms 3.85ms      350.    1.38MB     9   151      431ms <lgl ~ <Rpro~ <bch~ <tib~
2 "is.list(~ 2.48ms 2.84ms 2.73ms 8.25ms      352.   19.88KB     9   149      423ms <lgl ~ <Rpro~ <bch~ <tib~
colman-humphrey commented 5 years ago

I had an issue with very slow loads, permeating even calling library(aws.s3), since that would load aws.signature. On my end, the issue, was that check_ec2() is the bottleneck, since it makes a URL call-out that I guess gets retried a bunch of times? And this is called in .onLoad. Either way, it was taking a long time. I'm not sure if other people have this issue, as I only got it as long as aws.ec2metadata was installed on a non-ec2 instances (for testing), which was of course a terrible idea.

However, removing aws.ec2metadata didn't solve all my issues. I'll make a new issue hopefully outlining what I think is happening at least in my case.

micwalk commented 5 years ago

I opened a ticket within the aws.ec2metadata repo here for the slow is_ec2() call: https://github.com/cloudyr/aws.ec2metadata/issues/7

jon-mago commented 5 years ago

Hopefully with the 0.2.0 release of aws.ec2metadata this should be improved. The timeout for checking metadata has been decreased from the curl default of 10s to 1s (to bring it inline with boto3).

jyoungs commented 5 years ago

Hey @jon-mago - that timeout will help. What would be better (for us) is if aws.signature followed the typical credential precedence structure (check .aws/credentials before Instance/Container IAM Roles).

Seems pretty consistent:

Thoughts on why CloudyR is different, and maybe offerening the regular pattern as an option?

jon-mago commented 5 years ago

I'm not sure why it's different. I suspect the answer is merely "because it was written that way quite a while ago".

While it would be a kind of breaking change to "put the order right", I'm not averse to that, as I suspect it will break few things in practice. I will have a think on how to offer the standard order as an option, so it can be a more gradual transition.