cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 148 forks source link

Requests are unsigned since x-amz-content-sha256 is not part of the canonical headers #362

Closed octavd closed 4 years ago

octavd commented 4 years ago

Before filing an issue, please make sure you are using the latest development version which you can install using install.packages("aws.s3",repo="https://rforge.net") (see README) since the issue may have been fixed already. Also search existing issues first to avoid duplicates.

Please specify whether your issue is about:

Having a custom S3 storage and i am getting 403 forbidden no matter i do when trying to retreive the buckets. Could you please tell me what i am doing wrong?

Put your code here:

## load package
library("aws.s3")

## code goes here
bucketlist(base_url="storage", check_region = F, region="",
           use_https=T, url_style = 'virtual',
           key="key", secret="secret", verbose=T)

NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose): Forbidden (HTTP 403).
Traceback:

1. bucketlist(base_url = "url", 
 .     check_region = F, region = "", use_https = T, url_style = "virtual", 
 .     key = "key", secret = "secret", 
 .     verbose = T)
2. s3HTTP(verb = "GET", ...)
3. parse_aws_s3_response(r, Sig, verbose = verbose)
4. httr::stop_for_status(r)

## session info for your system
sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aws.s3_0.3.22

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          digest_0.6.25       crayon_1.3.4       
 [4] aws.signature_0.5.2 IRdisplay_0.7.0     R6_2.4.1           
 [7] repr_1.1.0          jsonlite_1.6.1      evaluate_0.14      
[10] httr_1.4.1          pillar_1.4.3        rlang_0.4.5        
[13] curl_4.3            uuid_0.1-4          xml2_1.2.2         
[16] IRkernel_1.1        tools_3.6.0         compiler_3.6.0     
[19] base64enc_0.1-3     htmltools_0.4.0     pbdZMQ_0.3-3      
octavd commented 4 years ago

Also i've tried with the following:

library("aws.s3")
Sys.setenv("AWS_ACCESS_KEY_ID" = "key",
           "AWS_SECRET_ACCESS_KEY" = "secret",
           "AWS_S3_ENDPOINT"="url")
get_bucket("bucket")

and i got the following error:


Error in curl::curl_fetch_memory(url, handle = handle): Empty reply from server
Traceback:

1. get_bucket("bucket")
2. s3HTTP(verb = "GET", bucket = bucket, query = query, parse_response = parse_response, 
 .     ...)
3. httr::GET(url, H, query = query, show_progress, ...)
4. request_perform(req, hu$handle$handle)
5. request_fetch(req$output, req$url, handle)
6. request_fetch.write_memory(req$output, req$url, handle)
7. curl::curl_fetch_memory(url, handle = handle)
s-u commented 4 years ago

403 means your authentication tokes are incorrect - see the full error description for details. Please consult the documentation of your custom back-end for details on what it requires.

octavd commented 4 years ago

Hello @s-u ,

First of all thank you for the reply. Secondly, there is no problem with the credentials. I've tried with postman putting the same key/secret/url and it retreived the buckets. Also, with python + boto3 with the same key/secret/url and it worked:

session = boto3.session.Session()

s3_client = session.client(
    service_name='s3',
    aws_access_key_id='aws_access_key_id',
    aws_secret_access_key='aws_secret_access_key',
    endpoint_url='url',
)

Unfortunately using this library it seems to not work. I cannot view the curl that is made, appears truncated.

Below is the full error displayed (using verbose = T):

Locating credentials

Checking for credentials in user-supplied values

Using user-supplied value for AWS Access Key ID

Using user-supplied value for AWS Secret Access Key

Using default value for AWS Region ('us-east-1')

Non-AWS base URL requested.

S3 Request URL: curstom_url

Executing request with AWS credentials

Locating credentials

Checking for credentials in user-supplied values

Using user-supplied value for AWS Access Key ID

Using user-supplied value for AWS Secret Access Key

Using default value for AWS Region ('us-east-1')

Parsing AWS API response

Client error: (403) Forbidden

List of 4
 $ Code     : chr "AccessDenied"
 $ Message  : chr "Access Denied"
 $ Resource : list()
 $ RequestId: chr "id"
 - attr(*, "headers")=List of 7
  ..$ server          : chr "S3 Server"
  ..$ x-amz-id-2      : chr "id"
  ..$ x-amz-request-id: chr "id"
  ..$ content-type    : chr "application/xml"
  ..$ content-length  : chr "174"
  ..$ date            : chr "Fri, 22 May 2020 15:24:50 GMT"
  ..$ connection      : chr "keep-alive"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "class")= chr "aws_error"
 - attr(*, "request_canonical")= chr "GET\n/\n\nhost:custom_url\nx-amz-date:20200522T152450Z\n\nho"| __truncated__
 - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20200522T152450Z\n20200522/us-east-1/s3/aws4_request\"| __truncated__
 - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=key/20200522/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-d"| __truncated__
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose): Forbidden (HTTP 403).
Traceback:

1. bucketlist(base_url = "url", 
 .     region = "", use_https = TRUE, key = "key", 
 .     secret = "secret", verbose = T)
2. s3HTTP(verb = "GET", ...)
3. parse_aws_s3_response(r, Sig, verbose = verbose)
4. httr::stop_for_status(r)

Could you please advise?

Thank you very much!

s-u commented 4 years ago

Ok, thanks, I'm wondering if your back-end has issues with the signature. The issue here is that we cannot really look into anything unless you provide details on the backend - and if it is something proprietary it may be impossible to reproduce.

The other way to go about it would be to look at the request from boto3 and compare it. Note that you can simply use

s3HTTP(base_url="url",  region = "", use_https=TRUE, key="key", 
 .     secret="secret", parse_response=FALSE)

to return the actual response object with all details.

octavd commented 4 years ago

Hello again, @s-u,

I did some more debugging with IntelliJ and i've found out the following:

the S3HTTP makes a headers list that contains the following (IntelliJ screenshot debug):

x-amz-date = (136 B) "20200527T115333Z"
x-amz-content-sha256 = (232 B) "content-sha256"
Authorization = (296 B) "AWS4-HMAC-SHA256 Credential=key/20200527/us-east-1/s3/aws4_request, SignedHeaders=;host;x-amz-date, Signature=signature"

Looking in the documentation of AWS for "Using Authorization Header" i've noticed that for SignedHeaders is an alphabetically sorted, semicolon-separated list of lowercase request header names. The request headers in the list are the same headers that you included in the CanonicalHeaders string. For example, for the previous example, the value of SignedHeaders would be as follows:

host;x-amz-content-sha256;x-amz-date

but as you can see that the request contains only:

SignedHeaders=;host;x-amz-date

I've tested and if we add the x-amz-content-sha256 in the SignedHeaders from the Authorization header everything works and the bucketlist is displayed.

s-u commented 4 years ago

@octavd thanks for digging! I think I got the issue. The problem is the the design of https://github.com/cloudyr/aws.signature creates a catch-22 situation: the aws.signature::signature_v4_auth() creates the signature and computes the body hash, but does NOT create the x-amz-content-sha256 header, so the signature is computed without it, hence it cannot be on the list of canonical headers. But in order to add it from the outside, one would have to include x-amz-content-sha256 in the list of canonical headers with the hash of the payload, which we don't have until we get the result of :signature_v4_auth() - which we can't get without the value of x-amz-content-sha256.

So I would recommend filing an issue with https://github.com/cloudyr/aws.signature against signature_v4_auth() to provide an option to generate x-amz-content-sha256 and add it to the canonical headers since that would be the place to do it.

That said, I have added a work-around to aws.s3 so the tit computes the body hash itself before calling aws.signature, so please see if that fixes your problem

octavd commented 4 years ago

@s-u works like a charm now! thank you very much for this fix. I see that you've created the issue on aws.signature repo. Also, could you please tell me when this will be available on cran? For the moment i'm building the tar.gz with make and installing it from that.