cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 147 forks source link

Is this package actively maintained? #433

Open davidbudzynski opened 12 months ago

davidbudzynski commented 12 months ago

I can see that this package hasn't received any commits to master since 2020 and there are a lot of unresolved issues without any response to them from the devs. Is it still active?

odysseu commented 10 months ago

Doesn't look like it is... 😞

image

davidbudzynski commented 10 months ago

the entrire cloudyr site looks like it's been abandoned unfortunately... It may make more sense to some people to just use aws-cli instead of an R specific layer to interact with it

odysseu commented 10 months ago

For other s3 solutions like minIO, there might be more chance trying https://github.com/paws-r/paws

DyfanJones commented 10 months ago

Hi All,

Just to give a little context around paws. Paws is designed to be an AWS SDK. It aims to give the full suite of AWS services from within R.

This means it follows other AWS SDK styles i.e. comparison between boto3 (python aws sdk) and paws.

import boto3

client = boto3.client("s3")
client.download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")
client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")

This is a different approach that aws.s3 and the other cloudyr project packages. As they give a more of a R approach.

aws.s3::save_object("path/to/my/file.txt" file = "file.txt", bucket = "mybucket")

# i.e. helpful wrapper function for more R friendly use
aws.s3::s3read_using(FUN = readLines, object ="path/to/my/file.txt", bucket = "mybucket")

aws.s3::s3read_using is ultimately a wrapper around aws.s3::save_object and the FUN parameter (in the above example readLines.

paws doesn't aim to replace these helper functions i.e. aws.s3::s3read_using however there are several packages the use paws to give a more R friendly interface:

As paws aims to offer a wide range of aws services it can also support packages that need connections to other AWS services other than AWS S3.

As the current maintainer of paws I am bias towards paws, however aws.s3 and the other cloudyr packages are an excellent set of tools to interface into AWS.

As aws.s3 hasn't had any updates as of late it does suffer from some of the latest AWS changes i.e. redirects

aws.s3::save_object(object = "path/to/my/file.txt", bucket = "mybucket", 
                    file = "file.txt")
#> List of 6
#>  $ Code     : chr "PermanentRedirect"
#>  $ Message  : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
#>  $ Endpoint : chr "mybucket.s3.amazonaws.com"
#>  $ Bucket   : chr "mybucket"
#>  $ RequestId: chr "RXE9GRFR52YZ31VP"
#>  $ HostId   : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>  - attr(*, "headers")=List of 7
#>   ..$ x-amz-bucket-region: chr "eu-west-1"
#>   ..$ x-amz-request-id   : chr "RXE9GRFR52YZ31VP"
#>   ..$ x-amz-id-2         : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>   ..$ content-type       : chr "application/xml"
#>   ..$ transfer-encoding  : chr "chunked"
#>   ..$ date               : chr "Fri, 10 Nov 2023 11:52:24 GMT"
#>   ..$ server             : chr "AmazonS3"
#>   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
#>  - attr(*, "class")= chr "aws_error"
#>  - attr(*, "request_canonical")= chr "GET\n/mybucket/path/to/my/file.txt\n\nhost:s3.amazonaws.com\nx-amz-date:20231110T115224Z\n\nhost;x-amz-date\ne3b0"| __truncated__
#>  - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20231110T115224Z\n20231110/us-east-1/s3/aws4_request\n544435802593214ecab8eb95fa7623779125eae"| __truncated__
#>  - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=DUMMY/20231110/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-da"| __truncated__
#> NULL
#> Error in parse_aws_s3_response(r, Sig, verbose = verbose): Moved Permanently (HTTP 301).

Created on 2023-11-10 with reprex v2.0.2

Whereas paws can handle these:

client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")
#> list()
file.exists("file.txt")
#> [1] TRUE

Created on 2023-11-10 with reprex v2.0.2

Final note, paws is generated from AWS's own API definitions so new services and methods are added within each release.

tyner commented 6 months ago

Just curious, does any benchmarking exist in terms of the relative speed of the cloudyr package functionalities versus their paws counterparts? For example, if we switch from the former to the latter, might we expect speed improvements when accessing S3 objects?

DyfanJones commented 6 months ago

@tyner consistent benchmark can be difficult due to a number of factors:

However performance can be still identified. Ultimately both packages use the curl package to make the api call to AWS. So the biggest performance would be achieve in how fast each package can make the call and parse the results.

For this PR https://github.com/paws-r/paws/pull/762 there is a benchmark that mocks the response from AWS (to remove the internet connection and aws api slowing down) to zone on the performance of making the AWS API call and parsing the results. It isn't an intensive benchmarking but I hope it helps.

Side note: performance has been a big factor in paws as of late (https://github.com/paws-r/paws/blob/main/paws.common/NEWS.md) with several functions being refactored into cpp to improve the overall speed of the sdk.

tyner commented 5 months ago

Thanks @DyfanJones, I did my own comparison of aws.s3::put_object versus s3fs::s3_file_upload and the timings were roughly the same. Looking forward to re-running the test under the new version of paws.common !

philiporlando commented 5 months ago

Looks like AWS' blog recommends using {paws} as well.