delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.34k stars 414 forks source link

Can not read/write DeltaTable on self hosted S3 Object Storage with self-signed ssl certificate #3001

Closed legout closed 5 days ago

legout commented 5 days ago

Environment

Delta-rs version: 0.21.0

Binding: Python

Environment: SeaweedFS S3 with self-signed ssl certificate


Bug

We have a self hosted seaweedfs s3 running on a virtual machine using self-signed certificates.

This is how I have configured my storage_options

storage_options = {
    "endpoint_url": "https://s3.name123.com/",  
    "aws_access_key_id": "some_user",
    "aws_secret_access_key": "some_pw",
    "aws_region": "us-eas-1",  # neccessary to avoid imds region warnings
}

Unfortunately, I am not able to read (or write) to our s3 using this storage_options .

>>> dt = DeltaTable("s3://test/delta", storage_options=storage_options)

File ~/.venv/lib/python3.12/site-packages/deltalake/table.py:415, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
    395 """
    396 Create the Delta Table from a path with an optional version.
    397 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    412 
    413 """
    414 self._storage_options = storage_options
--> 415 self._table = RawDeltaTable(
    416     str(table_uri),
    417     version=version,
    418     storage_options=storage_options,
    419     without_files=without_files,
    420     log_buffer_size=log_buffer_size,
    421 )

OSError: Generic S3 error: Error after 10 retries in 4.65365706s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.name123.com/test/delta/_delta_log/_last_checkpoint)

It think our self-signed ssl certs causes the problems here. Is it possible to disable the ssl certificate check in the storage_options? There is a parameter AllowInvalidCertificates in the object_store ClientConfig, which might do the trick. But I do not know how to pass this one into storage_options properly.

Note: With rclone I have to pass the parameter --no-check-certificate and when using fsspec, I have to provide client_kwargs={'verify':False} to connect to our seaweedfs s3.

legout commented 5 days ago

Nevermind, I´ve found the solution on my own, by testing several ways to set AllowInvalidCertificate in the storage_options

This is how I got it working:

storage_options = {
    "endpoint_url": "https://s3.name123.com/",  
    "aws_access_key_id": "some_user",
    "aws_secret_access_key": "some_pw",
    "aws_region": "us-eas-1",  # neccessary to avoid imds region warnings
    "allow_invalid_certificates":"true"
}