apache / opendal

Apache OpenDAL: access data freely.
https://opendal.apache.org
Apache License 2.0
3.35k stars 472 forks source link

backend/s3: Anonymous access not supported #54

Closed BohuTANG closed 2 years ago

BohuTANG commented 2 years ago

Background

The s3://repo.databend.rs/dataset/stateful/ontime.csv ACL is Everyone (public access) on AWS S3.

If the credentials not set for OpenDAL: image

will get the error:

displayText = Parse csv error at line 0, cause: unexpected: (op: read, path: /dataset/stateful/ontime.csv, source: failed to construct request: No credentials in the property bag

Enhancement It would be better to add a test for minio bucket with public ACL.

BohuTANG commented 2 years ago

Found another issue with setting credentials and S3_STORAGE_ENDPOINT_URL: export S3_STORAGE_ENDPOINT_URL=https://s3.amazonaws.com

displayText = Parse csv error at line 0, cause: unexpected: (op: read, path: /dataset/stateful/ontime.csv, source: Error { code: "PermanentRedirect", message: "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.", request_id: "xx", s3_extended_request_id: "xx" }).

Update:

I have test with the url: https://s3.amazonaws.com/repo.databend.rs/dataset/stateful/ontime.csv -- returns the same error

If we append the region, it works well: https://s3.us-east-2.amazonaws.com/repo.databend.rs/dataset/stateful/ontime.csv

Xuanwo commented 2 years ago

export S3_STORAGE_ENDPOINT_URL=https://s3.amazonaws.com

It's buggy now.

The behavior between keeping S3_STORAGE_ENDPOINT_URL empty and setting S3_STORAGE_ENDPOINT_URL=https://s3.amazonaws.com/ is different.

I'm considering removing the region option so that:

BohuTANG commented 2 years ago

Thank you.

The attempting to access must be addressed using the specified endpoint error issue already addressed in: https://github.com/datafuselabs/opendal/issues/37

Xuanwo commented 2 years ago

A bit complex but it is possible.

SDK's maintainer gives a workaround: https://github.com/awslabs/aws-sdk-rust/issues/425#issuecomment-1020265854

And I think we can inject the middleware to make it better:

let builder = aws_smithy_client::Builder::dyn_https()
    .middleware(AwsS3::middleware::DefaultMiddleware::new()) <-- inject middleware here.
    .build();

Ok(Arc::new(Backend {
    // Make `/` as the default of root.
    root,
    bucket: self.bucket.clone(),
    client: AwsS3::Client::with_config(builder, cfg.build()),
}))

We can config OperationSigningConfig.signing_requirements to SigningRequirements::Disabled maybe.

But it may take some time to figure out how to make it works.

BohuTANG commented 2 years ago

Thanks. Maybe we can prioritize this issue https://github.com/datafuselabs/opendal/issues/12 first, so that databend side can continue to list all files and read them for COPY command.

Xuanwo commented 2 years ago

Maybe we can use:

let client = aws_sdk_s3::Client::with_config(
    aws_sdk_s3::client::Builder::new()
        .connector(DynConnector::new(my_custom_connector)) // Now with `DynConnector`
        .middleware(DynMiddleware::new(my_custom_middleware)) // Now with `DynMiddleware`
        .default_async_sleep()
        .build(),
    config
);

Read https://github.com/awslabs/aws-sdk-rust/releases/tag/v0.7.0 for more details.

Xuanwo commented 2 years ago

Just to note, here is my today's exploration, I get the props from the request successfully:

use aws_smithy_client::{Builder, Client};

let hyper_connector = aws_smithy_client::hyper_ext::Adapter::builder()
    .build(aws_smithy_client::conns::https());
// let connector = DynConnector::new()

let x = aws_smithy_client::Builder::new()
    .connector(hyper_connector)
    .middleware(aws_smithy_client::erase::DynMiddleware::new(
        tower::util::MapRequestLayer::new(|req: aws_smithy_http::operation::Request| {
            let (inner, mut props) = req.into_parts();

            {
                let mut p = props.acquire_mut();
                let x = p.get_mut::<usize>();
                println!("{:?}", x);
            }

            aws_smithy_http::operation::Request::from_parts(inner, props)
        }),
    ))
    .build();

let _ = aws_sdk_s3::Client::with_config(x.into_dyn(), cfg.build());
Xuanwo commented 2 years ago

It works now!

image