delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.02k stars 365 forks source link

Use Aliyun OSS as storage backend #2361

Open Veiasai opened 4 months ago

Veiasai commented 4 months ago

Description

Though I think OSS is compatible with AWS S3. However, I tried that in python deltalake and encountered some auth issues.

I wonder how could I turn on verbose logging?

(by the way, aws-cli works well after I configure endpoint/region/credentials. I did same change in deltalake storage_options)

Use Case

Related Issue(s)

ion-elgreco commented 4 months ago

@Veiasai add this env variable: RUST_LOG='debug'

Veiasai commented 4 months ago
[2024-03-29T09:37:25Z DEBUG deltalake_aws] S3LogStoreFactory has been asked to create a LogStore without the dynamodb locking provider
[2024-03-29T09:37:25Z DEBUG reqwest::connect] starting new connection: https://oss-cn-hangzhou.aliyuncs.com/
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::dns] resolving host="oss-cn-hangzhou.aliyuncs.com"
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::http] connecting to 118.31.219.236:443
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::http] connected to 118.31.219.236:443
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] No cached session for DnsName("oss-cn-hangzhou.aliyuncs.com")
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] Not resuming any session
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] ALPN protocol is Some(b"http/1.1")
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] Using ciphersuite TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12::server_hello] Server supports tickets
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12] ECDHE curve is ECParameters { curve_type: NamedCurve, named_group: X25519 }
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12] Server DNS name is DnsName("oss-cn-hangzhou.aliyuncs.com")
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] flushed 509 bytes
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] parsed 7 headers
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body is content-length (1672 bytes)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] reuse idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] flushed 3924 bytes
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] parsed 8 headers
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body is content-length (374 bytes)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG rustls::common_state] Sending warning alert CloseNotify

hmm it doesn't show the raw http request.

pandada8 commented 1 month ago

You can use deltalake python package with aliyun oss by setting following environment variables, (replace region with your bucket region. e.g. cn-beijing):

export AWS_ACCESS_KEY_ID=<YOU FILL IT>
export AWS_SECRET_ACCESS_KEY=<YOU FILL IT>
export AWS_ENDPOINT_URL=https://<YOU_BUCKET>.oss-<region>.aliyuncs.com
export AWS_VIRTUAL_HOSTED_STYLE_REQUEST=true
export AWS_COPY_IF_NOT_EXISTS=header-with-status:x-oss-forbid-overwrite:true:409
export AWS_REGION=<region> # like cn-beijing

If you want to use HTTP and gain some extra performance

export AWS_ALLOW_HTTP=1
export AWS_ENDPOINT_URL=http://<YOU_BUCKET>.oss-<region>.aliyuncs.com

You can also use <YOU_BUCKET>.oss-<region>-internal.aliyuncs.com when in the same region.