cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 147 forks source link

parse_aws_s3_response swallows S3 error information #306

Open maxscheiber opened 5 years ago

maxscheiber commented 5 years ago

It is not currently possible to programmatically respond to all classes of S3 errors because parse_aws_s3_response swallows that information, instead using httr::stop_for_status.

See:

https://github.com/cloudyr/aws.s3/blob/8c722f8e84ea9f645ea394ef0908cfc003251155/R/s3HTTP.R#L238-L246

It would be great to have the option to return the actual out object, which is being printed, to the caller, or at least to propagate it as part of the raised error. As it stands, I cannot gracefully handle all S3 errors because Amazon attaches additional information to their HTTP 4xx errors (see https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html).

s-u commented 4 years ago

I started re-writing the error handling to unify the responses such that an error object is always returned which contains all vital information: https://github.com/cloudyr/aws.s3/tree/unify-response

The idea is that all functions by default cause HTTP errors to create an error condition object containing the error, request and response:

> aws.s3::get_bucket("non-existent")
Error in aws.s3::get_bucket("non-existent") : 
  S3 request was unsuccessful (Client error: (404) Not Found)

> str(tryCatch(aws.s3::get_bucket("non-existent"), error=function(e) e))
List of 4
 $ message: chr "S3 request was unsuccessful (Client error: (404) Not Found)"
 $ call   : language aws.s3::get_bucket("non-existent")
 $ result :List of 10
  ..$ url        : chr "https://s3.amazonaws.com/non-existent/"
  ..$ status_code: int 404
  ..$ headers    :List of 6
  .. ..$ x-amz-request-id : chr "XXX"
  .. ..$ x-amz-id-2       : chr "XXX"
  .. ..$ content-type     : chr "application/xml"
  .. ..$ transfer-encoding: chr "chunked"
  .. ..$ date             : chr "Thu, 19 Mar 2020 03:47:52 GMT"
  .. ..$ server           : chr "AmazonS3"
  .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
  ..$ all_headers:List of 1
  .. ..$ :List of 3
  .. .. ..$ status : int 404
  .. .. ..$ version: chr "HTTP/1.1"
  .. .. ..$ headers:List of 6
  .. .. .. ..$ x-amz-request-id : chr "XXX"
  .. .. .. ..$ x-amz-id-2       : chr "XXX"
  .. .. .. ..$ content-type     : chr "application/xml"
  .. .. .. ..$ transfer-encoding: chr "chunked"
  .. .. .. ..$ date             : chr "Thu, 19 Mar 2020 03:47:52 GMT"
  .. .. .. ..$ server           : chr "AmazonS3"
  .. .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
  ..$ cookies    :'data.frame': 0 obs. of  7 variables:
  .. ..$ domain    : logi(0) 
  .. ..$ flag      : logi(0) 
  .. ..$ path      : logi(0) 
  .. ..$ secure    : logi(0) 
  .. ..$ expiration: 'POSIXct' num(0) 
  .. ..$ name      : logi(0) 
  .. ..$ value     : logi(0) 
  ..$ content    : raw [1:302] 3c 3f 78 6d ...
  ..$ date       : POSIXct[1:1], format: "2020-03-19 03:47:52"
  ..$ times      : Named num [1:6] 0 0.00307 0.21874 0.68487 0.92469 ...
  .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
  ..$ request    :List of 7
  .. ..$ method    : chr "GET"
  .. ..$ url       : chr "https://s3.amazonaws.com/non-existent/"
  .. ..$ headers   : Named chr [1:4] "application/json, text/xml, application/xml, */*" "20200319T034752Z" "XXX" "AWS4-HMAC-SHA256 Credential=XXX/20200319/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-d"| __truncated__
  .. .. ..- attr(*, "names")= chr [1:4] "Accept" "x-amz-date" "x-amz-content-sha256" "Authorization"
  .. ..$ fields    : NULL
  .. ..$ options   :List of 2
  .. .. ..$ useragent: chr "libcurl/7.54.0 r-curl/4.3 httr/1.4.1"
  .. .. ..$ httpget  : logi TRUE
  .. ..$ auth_token: NULL
  .. ..$ output    : list()
  .. .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
  .. ..- attr(*, "class")= chr "request"
  ..$ handle     :Class 'curl_handle' <externalptr> 
  ..- attr(*, "class")= chr "response"
  ..- attr(*, "x-amz-request-id")= chr "XXX"
  ..- attr(*, "x-amz-id-2")= chr "XXX"
  ..- attr(*, "content-type")= chr "application/xml"
  ..- attr(*, "transfer-encoding")= chr "chunked"
  ..- attr(*, "date")= chr "Thu, 19 Mar 2020 03:47:52 GMT"
  ..- attr(*, "server")= chr "AmazonS3"
  ..- attr(*, "status_code")= int 404
 $ args   :List of 4
  ..$ verb          : chr "GET"
  ..$ bucket        : chr "non-existent"
  ..$ query         :List of 4
  .. ..$ prefix   : NULL
  .. ..$ delimiter: NULL
  .. ..$ max-keys : NULL
  .. ..$ marker   : NULL
  ..$ parse_response: logi TRUE
 - attr(*, "class")= chr [1:3] "S3error" "error" "condition"

> tryCatch(aws.s3::get_bucket("non-existent"), error=function(e) rawToChar(e$result$content))
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>non-existent</BucketName><RequestId>XXX</RequestId><HostId>XXX</HostId></Error>"

There is still also an option silent=TRUE which allows requests to fail silently, but it is not set by default. One nice thing about this is that it allows us to do re-tries for re-directs, so now an operation won't fail if you pick the wrong region, since we can re-try with the correct region automatically.

If anyone has strong opinions about the design, please comment.