cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 148 forks source link

Unable to use a custom base_url with delete_object #189

Closed amoeba closed 6 years ago

amoeba commented 6 years ago

I was writing a function delete an Object from a DigitalOcean Space (which follows the AWS S3 API) using aws.s3::delete_object and was getting an error about a bucket not found. A quick debug and a look at the source of delete_object revealed the cause was a superfluous call to

https://github.com/cloudyr/aws.s3/blob/c28925d8237979273c3bc57df67629b979891a06/R/delete_object.R#L21

It's superfluous in that it isn't used later on in the function body and, because I'm querying a custom endpoint and the function (appears?) hard-coded to only support AWS, I was getting an error.

Since this line of code isn't doing anything, I'm going to submit a PR so, if you agree with my change, this can get closed out quickly.

kaneplusplus commented 6 years ago

Please note, this is also fixed in pull request #194.

amoeba commented 6 years ago

Glad to hear it, @kaneplusplus! I guess I'll leave this open for now, though I'm okay with a maintainer closing it if needed.

leeper commented 6 years ago

I think this is now fixed, as it is essentially the same problem as https://github.com/cloudyr/aws.s3/issues/191. Let me know if it's not working and I will reopen. You'll need to set the AWS_S3_ENDPOINT environment variable.

amoeba commented 6 years ago

Great, thanks @leeper. That fix looks like it'll work for my use case so I'll give it a run through soon and let you know if I run into any issues.

leeper commented 6 years ago

Great. Thanks!

amoeba commented 6 years ago

Hey @leeper, now that I sit down to test this, it isn't working.

I spent some time yesterday and today attempting to debug my issue and I've had no luck. I get a SignatureDoesNotMatch error seemingly no matter what type of request I make:

> Sys.setenv(AWS_S3_ENDPOINT="nyc.digitaloceanspaces.com",
           AWS_ACCESS_KEY_ID = MY_DO_SPACES_KEY,
           AWS_SECRET_ACCESS_KEY = MY_DO_SPACES_SECRET)

> aws.s3::s3HTTP(verb = "GET")
List of 3
 $ Code     : chr "SignatureDoesNotMatch"
 $ RequestId: chr "tx0000000000000000d753a-005ad3e457-39dd70-nyc3a"
 $ HostId   : chr "39dd70-nyc3a-nyc"
 - attr(*, "headers")=List of 6
  ..$ content-length           : chr "190"
  ..$ x-amz-request-id         : chr "tx0000000000000000d753a-005ad3e457-39dd70-nyc3a"
  ..$ accept-ranges            : chr "bytes"
  ..$ content-type             : chr "application/xml"
  ..$ date                     : chr "Sun, 15 Apr 2018 23:46:31 GMT"
  ..$ strict-transport-security: chr "max-age=15552000; includeSubDomains; preload"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "class")= chr "aws_error"
 - attr(*, "request_canonical")= chr "GET\n/\n\nhost:nyc3.digitaloceanspaces.com\nx-amz-date:20180415T234631Z\n\nhost;x-amz-date\ne3b0c44298fc1c149af"| __truncated__
 - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20180415T234631Z\n20180415/us-east-1/s3/aws4_request\nc7bd088d33d593ec90cb8ca81e13d1d5bf6cde0"| __truncated__
 - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=PKY6K4DJPR25XGGG5VQO/20180415/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-d"| __truncated__
NULL

Upon inspection, I notice the local variable region in s3HTTP is set to us-east-1 (which you can see in my output above). Adjusting this doesn't fix the issue however. I looked at an example auth header from the DigitalOcean Spaces docs:

Authorization: AWS4-HMAC-SHA256
Credential=II5JDQBAN3JYM4DNEB6C/20170710/nyc3/s3/aws4_request,
SignedHeaders=host;x-amz-acl;x-amz-content-sha256;x-amz-date,
Signature=6cab03bef74a80a0441ab7fd33c829a2cdb46bba07e82da518cdb78ac238fda5

It looks like maybe the Authorization header is indicating a different set of hashed fields than what are being sent but I haven't figured out whether this is the case just yet.

Any insight here would be greatly appreciated!

leeper commented 6 years ago

I can revisit this at some point. Those docs have a full example, so we should be able to add test to aws.signature to ensure it's working on that level.

leeper commented 6 years ago

I've added a basic test to aws.signature. Unfortunately their examples don't give enough detail to provide a full test, but from looking at it, this seems like it is an aws.s3 rather than aws.signature problem.

Two ideas:

  1. Could be region. Is it supposed to be nyc or something? I think you're best off setting the AWS_DEFAULT_REGION environment variable.
  2. Bucket name and url_style. Looks like they may be requiring url_style = "virtual" (which is not the default. That affects how the bucket name is attached to the endpoint URL. (Also looks like you may be trying to specifying region and/or bucket name in the endpoint, but this should be a generic API url.)
amoeba commented 6 years ago

Awesome, thanks @leeper. Nice to narrow it down a bit. Both your points make sense (I think) and I'll take a look at fixing our code and report back.

amoeba commented 6 years ago

Made some slight debugging process. I now get this HTTP 400:

[1] <Code>XAmzContentSHA256Mismatch</Code>
[2] <BucketName>asdfasdfasd</BucketName>
[3] <RequestId>tx00000000000000996dd9c-005b0b6173-5c29c8-nyc3a</RequestId>
[4] <HostId>5c29c8-nyc3a-nyc</HostId>

when I run aws.s3::put_bucket(bucket = "analogsea") off of a build of master of aws.s3 and I've set AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY to my DO Spaces region, access key, and secret key. I also commented out the section that decides url_style and re-built the package from master so I'd get the right URL.

Here's the example for creating a Space (bucket) from DigitalOcean:

PUT / HTTP/1.1

Host: static-images.nyc3.digitaloceanspaces.com
x-amz-acl: public-read
x-amz-content-sha256: c6f1fc479f5f690c443b73a258aacc06ddad09eca0b001e9640ff2cd56fe5710
x-amz-date: 20170710T173143Z
Authorization: AWS4-HMAC-SHA256 Credential=II5JDQBAN3JYM4DNEB6C/20170710/nyc3/s3/aws4_request,SignedHeaders=host;x-amz-acl;x-amz-content-sha256;x-amz-date,Signature=6cab03bef74a80a0441ab7fd33c829a2cdb46bba07e82da518cdb78ac238fda5

<CreateBucketConfiguration>
  <LocationConstraint>nyc3</LocationConstraint>
</CreateBucketConfiguration>

Compared to what I end up sending:

HOST https://test.nyc3.digitaloceanspaces.com/

* x-amz-acl: private
* x-amz-date: 20180528T020035Z
* x-amz-content-sha256: 9b5bac799e64c876a137f1a521853789811af62e13cdcf4743a6dcb6c29290bd
* Authorization: AWS4-HMAC-SHA256 Credential=KSED6MLEXFXYZGPDN4KH/20180528/nyc3.digitaloceanspaces.com/s3/aws4_request, SignedHeaders=host;x-amz-acl;x-amz-date, Signature=fec8143dcfa41048547499b00c8412ea1238b55463026e892d44ff2caeff2e39

<CreateBucketConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\">
  <LocationConstraint>nyc3.digitaloceanspaces.com</LocationConstraint>
</CreateBucketConfiguration>"

The first difference I see is that the request body has a different LocationConstraint. So I put a browser() into aws.s3::s3HTTP and override the request_body variable before the PUT gets sent and I got the same error.

I also see the Credential part of the Authorization header has the full URL and not just the region name nyc3. So I edited that with a browser() while the function body was executing and got a

[1] <Code>SignatureDoesNotMatch</Code>
[2] <RequestId>tx00000000000000995b18c-005b0b6600-5c29c3-nyc3a</RequestId>
[3] <HostId>5c29c3-nyc3a-nyc</HostId>

which makes total sense.

The last thing I see is the difference in x-amz-acl (private vs. public). But that seems less related.

amoeba commented 6 years ago

Also noticed that x-amz-content-sha256 isn't in the signed headers list in the docs but is in the request I sent.

leeper commented 6 years ago

@amoeba What did you set AWS_DEFAULT_REGION to? Just nyc? I suspect that's the only problem. You can change acl in put_object() with put_object(acl = "public-read").

amoeba commented 6 years ago

I'm not totally sure. Let me try this again and post a more reproducible report.

amoeba commented 6 years ago

I'm not sure what I had set these to before, but I just put together a script that uses the latest build of aws.s3 from the master branch to create a Space (bucket). Pardon the verbosity but I just wanted to run through this from top to bottom again (mostly to jog my memory):

I set my env vars like:

Sys.setenv(
    "AWS_S3_ENDPOINT" = "digitaloceanspaces.com",
    "AWS_DEFAULT_REGION" = "nyc3",
    "AWS_ACCESS_KEY_ID"="...elided...",
    "AWS_SECRET_ACCESS_KEY"="...elided..."
)

When calling put_bucket, I can't find a configuration of region and base_url that works. If I call, for example,

aws.s3::put_bucket("analogsea-nyc3-test-two",
                   region = "nyc3",
                   key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
                   secret = Sys.getenv("DO_SPACES_SECRET_KEY"),
                   base_url = "digitaloceanspaces.com")

(which is the code I'd ideally like to write), I get a Method Not Allowed (HTTP 405). The URL that the PUT goes to is https://digitaloceanspaces.com/{space_name} which is missing the region in and space name in the subdomain.

If I try to work around it by setting base_url to nyc3.digitaloceanspaces.com,

aws.s3::put_bucket("analogsea-nyc3-test-two",
                   region = "nyc3",
                   key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
                   secret = Sys.getenv("DO_SPACES_SECRET_KEY"),
                   base_url = "nyc3.digitaloceanspaces.com")

I get a Bad Request (HTTP 400). The URL the PUT is being sent to in this case is https://nyc3.digitaloceanspaces.com/{space_name} which is nearly correct.

Verbose output from the above is

> aws.s3::put_bucket("analogsea-nyc3-test-two",
+                    region = "nyc3",
+                    key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
+                    secret = Sys.getenv("DO_SPACES_SECRET_KEY"),
+                    base_url = "nyc3.digitaloceanspaces.com",
+                    verbose = TRUE)
Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('nyc3')
S3 Request URL: https://nyc3.digitaloceanspaces.com/analogsea-nyc3-test-two/
Executing request with AWS credentials
Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('nyc3')
Checking for credentials in user-supplied values
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('nyc3')
Parsing AWS API response
Client error: (400) Bad Request
List of 10
 $ url        : chr "https://nyc3.digitaloceanspaces.com/analogsea-nyc3-test-two/"
 $ status_code: int 400
 $ headers    :List of 3
  ..$ date          : chr "Sun, 29 Jul 2018 03:09:13 GMT"
  ..$ content-length: chr "242"
  ..$ content-type  : chr "text/xml; charset=utf-8"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 1
  ..$ :List of 3
  .. ..$ status : int 400
  .. ..$ version: chr "HTTP/1.1"
  .. ..$ headers:List of 3
  .. .. ..$ date          : chr "Sun, 29 Jul 2018 03:09:13 GMT"
  .. .. ..$ content-length: chr "242"
  .. .. ..$ content-type  : chr "text/xml; charset=utf-8"
  .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ cookies    :'data.frame':    0 obs. of  7 variables:
  ..$ domain    : logi(0) 
  ..$ flag      : logi(0) 
  ..$ path      : logi(0) 
  ..$ secure    : logi(0) 
  ..$ expiration: 'POSIXct' num(0) 
  ..$ name      : logi(0) 
  ..$ value     : logi(0) 
 $ content    : raw [1:242] 3c 3f 78 6d ...
 $ date       : POSIXct[1:1], format: "2018-07-29 03:09:13"
 $ times      : Named num [1:6] 0 0.000057 0.00006 0.000133 0.161529 ...
  ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
 $ request    :List of 7
  ..$ method    : chr "PUT"
  ..$ url       : chr "https://nyc3.digitaloceanspaces.com/analogsea-nyc3-test-two/"
  ..$ headers   : Named chr [1:6] "application/json, text/xml, application/xml, */*" "" "private" "20180729T030908Z" ...
  .. ..- attr(*, "names")= chr [1:6] "Accept" "Content-Type" "x-amz-acl" "x-amz-date" ...
  ..$ fields    : NULL
  ..$ options   :List of 5
  .. ..$ useragent    : chr "libcurl/7.54.0 r-curl/3.2 httr/1.3.1"
  .. ..$ post         : logi TRUE
  .. ..$ postfieldsize: int 148
  .. ..$ postfields   : raw [1:148] 3c 43 72 65 ...
  .. ..$ customrequest: chr "PUT"
  ..$ auth_token: NULL
  ..$ output    : list()
  .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
  ..- attr(*, "class")= chr "request"
 $ handle     :Class 'curl_handle' <externalptr> 
 - attr(*, "class")= chr "aws_error"
 - attr(*, "headers")=List of 3
  ..$ date          : chr "Sun, 29 Jul 2018 03:09:13 GMT"
  ..$ content-length: chr "242"
  ..$ content-type  : chr "text/xml; charset=utf-8"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "request_canonical")= chr "PUT\n/analogsea-nyc3-test-two/\n\nhost:nyc3.digitaloceanspaces.com\nx-amz-acl:private\nx-amz-date:20180729T0309"| __truncated__
 - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20180729T030908Z\n20180729/nyc3/s3/aws4_request\naa684522d294e89c941afa692608f5a8b44441ef8a81"| __truncated__
 - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=PQX2IDA54KISDM4CFF4T/20180729/nyc3/s3/aws4_request, SignedHeaders=host;x-amz-acl;x-"| __truncated__
NULL

If I override setup_s3_url so that it sets the URL to https://analogsea-nyc3-test-two.nyc3.digitaloceanspaces.com/ I still get an HTTP 400. As before, if I dump the raw response from the API, I get this document:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>XAmzContentSHA256Mismatch</Code>
  <BucketName>analogsea-nyc3-test-two</BucketName>
  <RequestId>tx000000000000000dada75-005b5d320c-ac4154-nyc3a</RequestId>
  <HostId>ac4154-nyc3a-nyc</HostId>
</Error>
leeper commented 6 years ago

Thanks, again, for really detailed info. Give version 0.3.18 (current on Github) a try. There was some inflexible code, which seems to have been generating these errors. The SHA256 thing might be a separate bug or it might just be a reflection of trying to hack around with the internals and it will go away once the URL parsing stuff is fixed.

amoeba commented 6 years ago

Thanks for the fix. No luck with the latest build.

Running:

Sys.setenv(
...
           "AWS_S3_ENDPOINT" = "digitaloceanspaces.com",
           "AWS_DEFAULT_REGION" = "nyc3",
...
)

aws.s3::put_bucket("analogsea-nyc3-test-two-two",
                   region = "nyc3",
                   key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
                   secret = Sys.getenv("DO_SPACES_SECRET_KEY"),
                   base_url = "digitaloceanspaces.com",
                   verbose = TRUE,
                   url_style = "virtual")

I get an HTTP 400 w/

<Error>
  <Code>XAmzContentSHA256Mismatch</Code>
  <BucketName>analogsea-nyc3-test-two-two</BucketName>
  <RequestId>tx0000000000000003f8c93-005b5e3b46-ad3dec-nyc3a</RequestId>
  <HostId>ad3dec-nyc3a-nyc</HostId>
</Error>
> devtools::session_info()
...
aws.s3          0.3.18  2018-07-29 local

Would it be helpful for debugging if I shared my DO Spaces API keys with you?

leeper commented 6 years ago

If you're okay with that, yes. My gmail is thosjleeper.

leeper commented 6 years ago

I can confirm with the edits just pushed that the following works:

Sys.setenv("AWS_S3_ENDPOINT" = "digitaloceanspaces.com")
Sys.setenv("AWS_DEFAULT_REGION" = "nyc3")
Sys.getenv("DO_SPACES_ACCESS_KEY" = "something")
Sys.getenv("DO_SPACES_SECRET_KEY" = "something")

put_bucket("analogsea-nyc3-test-two-two",
                   location_constraint = NULL,
                   key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
                   secret = Sys.getenv("DO_SPACES_SECRET_KEY"))
## [1] TRUE
bucketlist(key = Sys.getenv("DO_SPACES_ACCESS_KEY"),
                secret = Sys.getenv("DO_SPACES_SECRET_KEY"))
##                        Bucket             CreationDate
## 1                redactedname 2018-07-29T02:34:49.009Z
## 2 analogsea-nyc3-test-two-two 2018-07-30T00:00:37.504Z
amoeba commented 6 years ago

Works for me too! Nice work, @leeper!! I'll test other API methods shortly and close this unless I find anything.

amoeba commented 6 years ago

This looks golden, thanks a million, @leeper. I was able to make some minor tweaks and PR this on analogsea just tonight.