BemiHQ / BemiDB

Postgres read replica optimized for analytics
https://bemidb.com
GNU Affero General Public License v3.0
1.09k stars 20 forks source link

Support custom S3-compatible endpoint #4

Open bnol opened 2 weeks ago

bnol commented 2 weeks ago

Support custom S3-compatible endpoint such as min.io

exAspArk commented 2 weeks ago

Hey @bnol — thanks for submitting this issue!

I've never used min.io, but because it is S3-compatible, it might just work with BemiDB out of the box.

I'm not sure how authentication and authorization works. If it is compatible with AWS S3 SDKs, would you be able to try something like this:

bemidb \
  --storage-type AWS_S3 \
  --iceberg-path iceberg \
  --aws-region us-east-1 \ # hardcoded https://github.com/minio/minio/discussions/15063
  --aws-s3-bucket [YOUR_BUCKET] \
  --aws-access-key-id [MINIO_ROOT_USER] \
  --aws-secret-access-key [MINIO_ROOT_PASSWORD] \
  start
renatocron commented 2 weeks ago

It's necessary to configure the endpoint as well, not just the region, because the SDK will point to the AWS endpoints.

it would be something like this:

// Add these fields to your Config struct
type Config struct {
    Aws struct {
        AccessKeyId     string
        SecretAccessKey string
        Region         string
        S3Bucket       string
        Endpoint       string  // New field for custom endpoint
        ForcePathStyle bool    // New field for path style addressing
    }
    // ... other existing fields
}

func NewS3Storage(config *Config) *StorageS3 {
    awsCredentials := credentials.NewStaticCredentialsProvider(
        config.Aws.AccessKeyId,
        config.Aws.SecretAccessKey,
        "",
    )

    var logMode aws.ClientLogMode
    // if config.LogLevel == LOG_LEVEL_DEBUG {
    //     logMode = aws.LogRequest | aws.LogResponse
    // }

    // Create custom endpoint resolver if endpoint is specified
    var endpointResolver aws.EndpointResolverWithOptions
    if config.Aws.Endpoint != "" {
        endpointResolver = aws.EndpointResolverWithOptionsFunc(func(service, region string, options ...interface{}) (aws.Endpoint, error) {
            return aws.Endpoint{
                URL:               config.Aws.Endpoint,
                SigningRegion:     config.Aws.Region,
                HostnameImmutable: true,
            }, nil
        })
    }

    // Load AWS config with custom options
    configOptions := []func(*awsConfig.LoadOptions) error{
        awsConfig.WithRegion(config.Aws.Region),
        awsConfig.WithCredentialsProvider(awsCredentials),
        awsConfig.WithClientLogMode(logMode),
    }

    // Add custom endpoint resolver if specified
    if endpointResolver != nil {
        configOptions = append(configOptions, 
            awsConfig.WithEndpointResolverWithOptions(endpointResolver))
    }

    // Add force path style option if specified
    if config.Aws.ForcePathStyle {
        configOptions = append(configOptions,
            awsConfig.WithS3ForcePathStyle(true))
    }

    loadedAwsConfig, err := awsConfig.LoadDefaultConfig(
        context.Background(),
        configOptions...,
    )
    PanicIfError(err)

    return &StorageS3{
        s3Client:    s3.NewFromConfig(loadedAwsConfig),
        config:      config,
        storageBase: &StorageBase{config: config},
    }
}

Then with a few options it would be possible to use the SDK with any S3-compatible service that supports the AWS S3 API. The options would be:

  1. For MinIO:

    aws:
    endpoint: "http://minio-server:9000"
    region: "us-east-1"  # Can be any valid region
    s3_bucket: "your-bucket"
    access_key_id: "your-access-key"
    secret_access_key: "your-secret-key"
    force_path_style: true  # MinIO typically requires path-style addressing
  2. For Backblaze B2 / Digital Ocean / Wasabi / etc.:

    aws:
    endpoint: "https://s3.us-west-001.backblazeb2.com"  # Use your B2 region
    region: "us-west-001"  # B2 region
    s3_bucket: "your-bucket"
    access_key_id: "your-keyID"
    secret_access_key: "your-applicationKey"
    force_path_style: false  # B2 uses virtual-hosted-style addressing
bnol commented 2 weeks ago

@renatocron perhaps you should submit a PR? :D

exAspArk commented 1 week ago

It's necessary to configure the endpoint

Ah, you're right, good point! We, unfortunately, won't be able to prioritize it in the coming days because we'll be building some of the things mentioned in our Future Roadmap first (e.g., native support for complex data structures like JSON and arrays)


But if anyone would like to submit a PR, we'll be very happy to help get it merged. At a high level, here are the main things that would need to be updated:

  1. Add support for a new argument like --aws-s3-endpoint similarly to other arguments (see config.go)
  2. Write: Pass a custom endpoint as BaseEndpoint when creating an S3 client (see NewFromConfig in the code)
  3. Read: Add ENDPOINT into the DuckDB's secret to be able to read Iceberg tables (see aws_s3_secret in the code)