apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

object_store: Add support for requester pays buckets #6768

Open kylebarron opened 19 hours ago

kylebarron commented 19 hours ago

Which issue does this PR close?

Closes #6716.

Rationale for this change

Support for AWS S3 requester pays buckets. Open data is commonly provided through these buckets because it allows data providers to pay only for the cost of the data and not for the data egress.

What changes are included in this PR?

Are there any user-facing changes?

Addition of AmazonS3Builder::with_request_payer

kylebarron commented 19 hours ago

I also ran some manual testing locally to ensure it works end-to-end, but not sure if there's any place for them here

#[tokio::test]
async fn test_get_request_payer() {
    let client = AmazonS3Builder::new()
        .with_access_key_id("REDACTED")
        .with_secret_access_key("REDACTED")
        .with_bucket_name("naip-visualization")
        .with_region("us-west-2")
        .with_request_payer(true)
        .build()
        .unwrap();
    let resp = client.get(&"readme.txt".into()).await.unwrap();
    let buf = resp.bytes().await.unwrap();
    let s = String::from_utf8(buf.into()).unwrap();
    dbg!(s);
}

ran successfully and gave:

successes:

---- aws::client::tests::test_get_request_payer stdout ----
[src/aws/client.rs:908:9] s = "Visualization NAIP on AWS\n\nThe National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental United States. This leaf-on imagery typically ranges from 60 centimeters to 100 centimeters in resolution. In the naip-visualization Amazon S3 bucket, you’ll find GeoTIFF 3-band RGB imagery at source resolution, which has been compressed, tiled, and cloud-optimized. This data is useful as source for background imagery, or to use to download a subsampled version of the original.\n\nNAIP is administered by t
...

and

#[tokio::test]
async fn test_signed_get_request_payer() {
    let client = AmazonS3Builder::new()
        .with_access_key_id("REDACTED")
        .with_secret_access_key("REDACTED")
        .with_bucket_name("naip-visualization")
        .with_region("us-west-2")
        .with_request_payer(true)
        .build()
        .unwrap();
    let url = client
        .signed_url(Method::GET, &"readme.txt".into(), Duration::from_secs(60))
        .await
        .unwrap();
    dbg!(url.to_string());
}

gave

https://s3.us-west-2.amazonaws.com/naip-visualization/readme.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=REDACTED%2F20241121%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241121T131723Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&x-amz-request-payer=requester&X-Amz-Signature=8b9b548bd41b24a0cdf8eacfbccb3364472efdf5ebf5d9cf943e7c6c23e1d3be

which also worked:

image
kylebarron commented 18 hours ago

I don't think the CI failure is related to my changes.