cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 148 forks source link

save_object: wrap s3HTTP call in try to be able to remove files created for missing objects. #397

Open cstepper opened 3 years ago

cstepper commented 3 years ago

Hi,

one more suggestion for save_object() related to #288.

Currently the file is created and populated with the response error, if the relevant object/key is not available on the s3 bucket.

library(aws.s3)

bucket = "1000genomes"

key = "abc.txt"
file = file.path(tempdir(), bucket, key)

# current behavior 
#   file is created and http-error is written to file 
res = try(save_object(object = key, bucket = bucket, file = file))
#> Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
#>   Not Found (HTTP 404).

res
#> [1] "Error in parse_aws_s3_response(r, Sig, verbose = verbose) : \n  Not Found (HTTP 404).\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <http_404 in parse_aws_s3_response(r, Sig, verbose = verbose): Not Found (HTTP 404).>

xml2::read_xml(file)
#> {xml_document}
#> <Error>
#> [1] <Code>NoSuchKey</Code>
#> [2] <Message>The specified key does not exist.</Message>
#> [3] <Key>abc.txt</Key>
#> [4] <RequestId>16MVA0HKJMWCNBRF</RequestId>
#> [5] <HostId>E90D/iq42bKJwh6zwdE7uk8qulfek1dHBqTeEy9y+IpgP3e6Qk6R8R5V2x/k5225Y ...

Created on 2021-09-14 by the reprex package (v2.0.1)

What about wrapping the s3HTTP call in a try, and then remove the file (and directory) on try-error? This would result in the following:

library(aws.s3)

bucket = "1000genomes"

key = "abc.txt"
file = file.path(tempdir(), bucket, key)

# new behavior 
#   file is created, result is captured, file deleted if error
res = try(save_object(object = key, bucket = bucket, file = file))
#> Error : Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
#>   Not Found (HTTP 404).

res
#> [1] "Error : Error in parse_aws_s3_response(r, Sig, verbose = verbose) : \n  Not Found (HTTP 404).\n\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <simpleError: Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
#>   Not Found (HTTP 404).
#> >

file.exists(file)
#> [1] FALSE

Created on 2021-09-14 by the reprex package (v2.0.1)

Happy to discuss if this helps.

Thanks, Christoph

yogat3ch commented 2 years ago

Bumping this bug fix as it really needs to be implemented. Is anyone maintaining this package @s-u? If I wrote a PR for this will it get reviewed?

s-u commented 2 years ago

Well, the entire error handling requires a re-write, because it's a mess. I have started a branch unify-response that was aimed at addressing that, but a) I wasn't sure how much code relies on the current behavior and b) at some point ran out of time. So I'll have a look, but I'm currently tied up/away for the next few weeks.

yogat3ch commented 2 years ago

Thanks for the reply @s-u I'll take a look soon